14 resultados para multiple data sources

em Cambridge University Engineering Department Publications Database


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Compared with construction data sources that are usually stored and analyzed in spreadsheets and single data tables, data sources with more complicated structures, such as text documents, site images, web pages, and project schedules have been less intensively studied due to additional challenges in data preparation, representation, and analysis. In this paper, our definition and vision for advanced data analysis addressing such challenges are presented, together with related research results from previous work, as well as our recent developments of data analysis on text-based, image-based, web-based, and network-based construction sources. It is shown in this paper that particular data preparation, representation, and analysis operations should be identified, and integrated with careful problem investigations and scientific validation measures in order to provide general frameworks in support of information search and knowledge discovery from such information-abundant data sources.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The fundamental aim of clustering algorithms is to partition data points. We consider tasks where the discovered partition is allowed to vary with some covariate such as space or time. One approach would be to use fragmentation-coagulation processes, but these, being Markov processes, are restricted to linear or tree structured covariate spaces. We define a partition-valued process on an arbitrary covariate space using Gaussian processes. We use the process to construct a multitask clustering model which partitions datapoints in a similar way across multiple data sources, and a time series model of network data which allows cluster assignments to vary over time. We describe sampling algorithms for inference and apply our method to defining cancer subtypes based on different types of cellular characteristics, finding regulatory modules from gene expression data from multiple human populations, and discovering time varying community structure in a social network.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In the Climate Change Act of 2008 the UK Government pledged to reduce carbon emissions by 80% by 2050. As one step towards this, regulations are being introduced requiring all new buildings to be ‘zero carbon’ by 2019. These are defined as buildings which emit net zero carbon during their operational lifetime. However, in order to meet the 80% target it is necessary to reduce the carbon emitted during the whole life-cycle of buildings, including that emitted during the processes of construction. These elements make up the ‘embodied carbon’ of the building. While there are no regulations yet in place to restrict embodied carbon, a number of different approaches have been made. There are several existing databases of embodied carbon and embodied energy. Most provide data for the material extraction and manufacturing only, the ‘cradle to factory gate’ phase. In addition to the databases, various software tools have been developed to calculate embodied energy and carbon of individual buildings. A third source of data comes from the research literature, in which individual life cycle analyses of buildings are reported. This paper provides a comprehensive review, comparing and assessing data sources, boundaries and methodologies. The paper concludes that the wide variations in these aspects produce incomparable results. It highlights the areas where existing data is reliable, and where new data and more precise methods are needed. This comprehensive review will guide the future development of a consistent and transparent database and software tool to calculate the embodied energy and carbon of buildings.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Compared with structured data sources that are usually stored and analyzed in spreadsheets, relational databases, and single data tables, unstructured construction data sources such as text documents, site images, web pages, and project schedules have been less intensively studied due to additional challenges in data preparation, representation, and analysis. In this paper, our vision for data management and mining addressing such challenges are presented, together with related research results from previous work, as well as our recent developments of data mining on text-based, web-based, image-based, and network-based construction databases.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Obtaining accurate confidence measures for automatic speech recognition (ASR) transcriptions is an important task which stands to benefit from the use of multiple information sources. This paper investigates the application of conditional random field (CRF) models as a principled technique for combining multiple features from such sources. A novel method for combining suitably defined features is presented, allowing for confidence annotation using lattice-based features of hypotheses other than the lattice 1-best. The resulting framework is applied to different stages of a state-of-the-art large vocabulary speech recognition pipeline, and consistent improvements are shown over a sophisticated baseline system. Copyright © 2011 ISCA.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In the Climate Change Act of 2008 the UK Government pledged to reduce carbon emissions by 80% by 2050. As one step towards this, regulations are being introduced requiring all new buildings to be ‘zero carbon’ by 2019. These are defined as buildingswhichemitnetzerocarbonduringtheiroperationallifetime.However,inordertomeetthe80%targetitisnecessary to reduce the carbon emitted during the whole life-cycle of buildings, including that emitted during the processes of construction. These elements make up the ‘embodied carbon’ of the building. While there are no regulations yet in place to restrictembodiedcarbon,anumberofdifferentapproacheshavebeenmade.Thereareseveralexistingdatabasesofembodied carbonandembodiedenergy.Mostprovidedataforthematerialextractionandmanufacturingonly,the‘cradletofactorygate’ phase. In addition to the databases, various software tools have been developed to calculate embodied energy and carbon of individual buildings. A third source of data comes from the research literature, in which individual life cycle analyses of buildings are reported. This paper provides a comprehensive review, comparing and assessing data sources, boundaries and methodologies. The paper concludes that the wide variations in these aspects produce incomparable results. It highlights the areas where existing data is reliable, and where new data and more precise methods are needed. This comprehensive review will guide the future development of a consistent and transparent database and software tool to calculate the embodied energy and carbon of buildings.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

We present a nonparametric Bayesian method for disease subtype discovery in multi-dimensional cancer data. Our method can simultaneously analyse a wide range of data types, allowing for both agreement and disagreement between their underlying clustering structure. It includes feature selection and infers the most likely number of disease subtypes, given the data. We apply the method to 277 glioblastoma samples from The Cancer Genome Atlas, for which there are gene expression, copy number variation, methylation and microRNA data. We identify 8 distinct consensus subtypes and study their prognostic value for death, new tumour events, progression and recurrence. The consensus subtypes are prognostic of tumour recurrence (log-rank p-value of $3.6 \times 10^{-4}$ after correction for multiple hypothesis tests). This is driven principally by the methylation data (log-rank p-value of $2.0 \times 10^{-3}$) but the effect is strengthened by the other 3 data types, demonstrating the value of integrating multiple data types. Of particular note is a subtype of 47 patients characterised by very low levels of methylation. This subtype has very low rates of tumour recurrence and no new events in 10 years of follow up. We also identify a small gene expression subtype of 6 patients that shows particularly poor survival outcomes. Additionally, we note a consensus subtype that showly a highly distinctive data signature and suggest that it is therefore a biologically distinct subtype of glioblastoma. The code is available from https://sites.google.com/site/multipledatafusion/

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Usually, firms that produce innovative global products are discussed within the context of developed countries. New ventures in developing countries are typically viewed as low-cost product providers that generate technologically similar products to those produced by developed economies. However, this paper argues that some Chinese university spin-outs (USOs), although rare, have adopted a novel 'catch-up' strategy to build global products on the basis of indigenous platform technologies. This paper attempts to develop a conceptual framework to address the question: how do these specific Chinese USOs develop their innovation capabilities to build global products? In order to explore the idiosyncrasies of the specific USOs, this paper uses the multiple case studies method. The primary data sources are accessed through semi-structured interviews. In addition, archival data and other materials are used as secondary sources. The study analyses the configuration of capabilities that are needed for idiosyncratic growth, and maps them to the globalisation processes. This paper provides a strategic 'roadmap' as an explanatory guide to entrepreneurs, policy makers and investors to better understand the phenomena. © 2014 Inderscience Enterprises Ltd.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

We present a map of the transformation of energy in China as a Sankey diagram. After a review of previous work, and a statement of methodology, our main work has been the identification, evaluation, and treatment of appropriate data sources. This data is used to construct the Sankey diagram, in which flows of energy are traced from energy sources through end-use conversion devices, passive systems and final services to demand drivers. The resulting diagram provides a convenient and clear snapshot of existing energy transformations in China which can usefully be compared with a similar global analysis and which emphasises the potential for improvements in energy efficiency in 'passive systems'. More broadly, it gives a basis for examining and communicating future energy scenarios, including changes to demand, changes to the supply mix, changes in efficiency and alternative provision of existing services. © 2012 Elsevier Ltd.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The task of word-level confidence estimation (CE) for automatic speech recognition (ASR) systems stands to benefit from the combination of suitably defined input features from multiple information sources. However, the information sources of interest may not necessarily operate at the same level of granularity as the underlying ASR system. The research described here builds on previous work on confidence estimation for ASR systems using features extracted from word-level recognition lattices, by incorporating information at the sub-word level. Furthermore, the use of Conditional Random Fields (CRFs) with hidden states is investigated as a technique to combine information for word-level CE. Performance improvements are shown using the sub-word-level information in linear-chain CRFs with appropriately engineered feature functions, as well as when applying the hidden-state CRF model at the word level.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This article explores risk management in global industrial investment by identifying linkages and gaps between theories and practices. It identifies opportunities for further development of the field. Three related bodies of literature have been reviewed: risk management, global manufacturing and investment. The review suggests that risk management in global manufacturing is overlooked in the literature; that existing theoretical risk management processes are not well developed in the global manufacturing context and that the investment literature applies mainly to financial risk assessment rather than investment risk management structures. Further, there appears to be a serious lack of systematic industrial risk management in investment decision making. This article highlights the opportunities to deploy current good practices more effectively as well as the need to develop more robust theories of industrial investment risk management. The approach adopted to investigate this multidisciplinary topic included a historical review of literature to understand the diverse background of theoretical development. A case study research approach was adopted to collect data, involving four global manufacturing companies and one risk management advisory company to observe the patterns and rationale of current practices. Supporting arguments from secondary data sources reinforced the findings. The research focuses risk management in global industrial investment. It links theories with practice to understand the existing knowledge gap and proposes key research themes for further research. © 2013 Macmillan Publishers Ltd. 1460-3799 Risk Management.