38 resultados para BENCHMARKING
Resumo:
We present a benchmark system for global vegetation models. This system provides a quantitative evaluation of multiple simulated vegetation properties, including primary production; seasonal net ecosystem production; vegetation cover; composition and height; fire regime; and runoff. The benchmarks are derived from remotely sensed gridded datasets and site-based observations. The datasets allow comparisons of annual average conditions and seasonal and inter-annual variability, and they allow the impact of spatial and temporal biases in means and variability to be assessed separately. Specifically designed metrics quantify model performance for each process, and are compared to scores based on the temporal or spatial mean value of the observations and a "random" model produced by bootstrap resampling of the observations. The benchmark system is applied to three models: a simple light-use efficiency and water-balance model (the Simple Diagnostic Biosphere Model: SDBM), the Lund-Potsdam-Jena (LPJ) and Land Processes and eXchanges (LPX) dynamic global vegetation models (DGVMs). In general, the SDBM performs better than either of the DGVMs. It reproduces independent measurements of net primary production (NPP) but underestimates the amplitude of the observed CO2 seasonal cycle. The two DGVMs show little difference for most benchmarks (including the inter-annual variability in the growth rate and seasonal cycle of atmospheric CO2), but LPX represents burnt fraction demonstrably more accurately. Benchmarking also identified several weaknesses common to both DGVMs. The benchmarking system provides a quantitative approach for evaluating how adequately processes are represented in a model, identifying errors and biases, tracking improvements in performance through model development, and discriminating among models. Adoption of such a system would do much to improve confidence in terrestrial model predictions of climate change impacts and feedbacks.
Resumo:
Performance modelling is a useful tool in the lifeycle of high performance scientific software, such as weather and climate models, especially as a means of ensuring efficient use of available computing resources. In particular, sufficiently accurate performance prediction could reduce the effort and experimental computer time required when porting and optimising a climate model to a new machine. In this paper, traditional techniques are used to predict the computation time of a simple shallow water model which is illustrative of the computation (and communication) involved in climate models. These models are compared with real execution data gathered on AMD Opteron-based systems, including several phases of the U.K. academic community HPC resource, HECToR. Some success is had in relating source code to achieved performance for the K10 series of Opterons, but the method is found to be inadequate for the next-generation Interlagos processor. The experience leads to the investigation of a data-driven application benchmarking approach to performance modelling. Results for an early version of the approach are presented using the shallow model as an example.
Resumo:
Two so-called “integrated” polarimetric rate estimation techniques, ZPHI (Testud et al., 2000) and ZZDR (Illingworth and Thompson, 2005), are evaluated using 12 episodes of the year 2005 observed by the French C-band operational Trappes radar, located near Paris. The term “integrated” means that the concentration parameter of the drop size distribution is assumed to be constant over some area and the algorithms retrieve it using the polarimetric variables in that area. The evaluation is carried out in ideal conditions (no partial beam blocking, no ground-clutter contamination, no bright band contamination, a posteriori calibration of the radar variables ZH and ZDR) using hourly rain gauges located at distances less than 60 km from the radar. Also included in the comparison, for the sake of benchmarking, is a conventional Z = 282R1.66 estimator, with and without attenuation correction and with and without adjustment by rain gauges as currently done operationally at Météo France. Under those ideal conditions, the two polarimetric algorithms, which rely solely on radar data, appear to perform as well if not better, pending on the measurements conditions (attenuation, rain rates, …), than the conventional algorithms, even when the latter take into account rain gauges through the adjustment scheme. ZZDR with attenuation correction is the best estimator for hourly rain gauge accumulations lower than 5 mm h−1 and ZPHI is the best one above that threshold. A perturbation analysis has been conducted to assess the sensitivity of the various estimators with respect to biases on ZH and ZDR, taking into account the typical accuracy and stability that can be reasonably achieved with modern operational radars these days (1 dB on ZH and 0.2 dB on ZDR). A +1 dB positive bias on ZH (radar too hot) results in a +14% overestimation of the rain rate with the conventional estimator used in this study (Z = 282R^1.66), a -19% underestimation with ZPHI and a +23% overestimation with ZZDR. Additionally, a +0.2 dB positive bias on ZDR results in a typical rain rate under- estimation of 15% by ZZDR.
Resumo:
This paper develops a framework for evaluating sustainability assessment methods by separately analyzing their normative, systemic and procedural dimensions as suggested by Wiek and Binder [Wiek, A, Binder, C. Solution spaces for decision-making – a sustainability assessment tool for city-regions. Environ Impact Asses Rev 2005, 25: 589-608.]. The framework is then used to characterize indicator-based sustainability assessment methods in agriculture. For a long time, sustainability assessment in agriculture has focused mostly on environmental and technical issues, thus neglecting the economic and, above all, the social aspects of sustainability, the multifunctionality of agriculture and the applicability of the results. In response to these shortcomings, several integrative sustainability assessment methods have been developed for the agricultural sector. This paper reviews seven of these that represent the diversity of tools developed in this area. The reviewed assessment methods can be categorized into three types: (i) top-down farm assessment methods; (ii) top-down regional assessment methods with some stakeholder participation; (iii) bottom-up, integrated participatory or transdisciplinary methods with stakeholder participation throughout the process. The results readily show the trade-offs encountered when selecting an assessment method. A clear, standardized, top-down procedure allows for potentially benchmarking and comparing results across regions and sites. However, this comes at the cost of system specificity. As the top-down methods often have low stakeholder involvement, the application and implementation of the results might be difficult. Our analysis suggests that to include the aspects mentioned above in agricultural sustainability assessment, the bottomup, integrated participatory or transdisciplinary methods are the most suitable ones.
Resumo:
Several methods for assessing the sustainability of agricultural systems have been developed. These methods do not fully: (i) take into account the multi‐functionality of agriculture; (ii) include multidimensionality; (iii) utilize and implement the assessment knowledge; and (iv) identify conflicting goals and trade‐offs. This paper reviews seven recently developed multidisciplinary indicator‐based assessment methods with respect to their contribution to these shortcomings. All approaches include (1) normative aspects such as goal setting, (2) systemic aspects such as a specification of scale of analysis, (3) a reproducible structure of the approach. The approaches can be categorized into three typologies. The top‐down farm assessments focus on field or farm assessment. They have a clear procedure for measuring the indicators and assessing the sustainability of the system, which allows for benchmarking across farms. The degree of participation is low, potentially affecting the implementation of the results negatively. The top‐down regional assessment assesses the on‐farm and the regional effects. They include some participation to increase acceptance of the results. However, they miss the analysis of potential trade‐offs. The bottom‐up, integrated participatory or transdisciplinary approaches focus on a regional scale. Stakeholders are included throughout the whole process assuring the acceptance of the results and increasing the probability of implementation of developed measures. As they include the interaction between the indicators in their system representation, they allow for performing a trade‐off analysis. The bottom‐up, integrated participatory or transdisciplinary approaches seem to better overcome the four shortcomings mentioned above.
Resumo:
Aircraft Maintenance, Repair and Overhaul (MRO) feedback commonly includes an engineer’s complex text-based inspection report. Capturing and normalizing the content of these textual descriptions is vital to cost and quality benchmarking, and provides information to facilitate continuous improvement of MRO process and analytics. As data analysis and mining tools requires highly normalized data, raw textual data is inadequate. This paper offers a textual-mining solution to efficiently analyse bulk textual feedback data. Despite replacement of the same parts and/or sub-parts, the actual service cost for the same repair is often distinctly different from similar previously jobs. Regular expression algorithms were incorporated with an aircraft MRO glossary dictionary in order to help provide additional information concerning the reason for cost variation. Professional terms and conventions were included within the dictionary to avoid ambiguity and improve the outcome of the result. Testing results show that most descriptive inspection reports can be appropriately interpreted, allowing extraction of highly normalized data. This additional normalized data strongly supports data analysis and data mining, whilst also increasing the accuracy of future quotation costing. This solution has been effectively used by a large aircraft MRO agency with positive results.
Resumo:
While style analysis has been studied extensively in equity markets, applications of this valuable tool for measuring and benchmarking performance and risk in a real estate context are still relatively new. Most previous real estate studies on this topic have identified three investment categories (rather than styles): sectors, administrative regions and economic regions. However, the low explanatory power reveals the need to extend this analysis to other investment styles. We identify four main real estate investment styles and apply a multivariate model to randomly generated portfolios to test the significance of each style in explaining portfolio returns. Results show that significant alpha performance is significantly reduced when we account for the new investment styles, with small vs. big properties being the dominant one. Secondly, we find that the probability of obtaining alpha performance is dependent upon the actual exposure of funds to style factors. Finally we obtain that both alpha and systematic risk levels are linked to the actual characteristics of portfolios. Our overall results suggest that it would be beneficial for real estate fund managers to use these style factors to set benchmarks and to analyze portfolio returns.
Resumo:
There is potential to reduce both operational and embodied greenhouse gas emission from buildings. To date the focus has been on reducing the operational element, although given the urgency of carbon reductions, it may be more beneficial to consider upfront embodied carbon reductions. This paper describes a case study on the whole life carbon cycle of a warehouse building in Swindon, UK. It examines the relationship between embodied carbon (Ec) and operational carbon (Oc), the proportions of Ec from the structural and non-structural elements, carbon benchmarking of the structure, the value of ‘cradle to site’ or ‘cradle to grave’ assessments and the significance of the timing of emissions during the life of the building. The case study indicates that Ec was dominant for the building and that the structure was responsible for more than half of the Ec. Weighting of future emissions appears to be an important factor to consider. The PAS 2050 reduction factors had only a modest effect but weighting to allow for future decarbonisation of the national grid energy supply had a large effect. This suggests that future operational carbon emissions are being overestimated compared to embodied.
Resumo:
This research presents a novel multi-functional system for medical Imaging-enabled Assistive Diagnosis (IAD). Although the IAD demonstrator has focused on abdominal images and supports the clinical diagnosis of kidneys using CT/MRI imaging, it can be adapted to work on image delineation, annotation and 3D real-size volumetric modelling of other organ structures such as the brain, spine, etc. The IAD provides advanced real-time 3D visualisation and measurements with fully automated functionalities as developed in two stages. In the first stage, via the clinically driven user interface, specialist clinicians use CT/MRI imaging datasets to accurately delineate and annotate the kidneys and their possible abnormalities, thus creating “3D Golden Standard Models”. Based on these models, in the second stage, clinical support staff i.e. medical technicians interactively define model-based rules and parameters for the integrated “Automatic Recognition Framework” to achieve results which are closest to that of the clinicians. These specific rules and parameters are stored in “Templates” and can later be used by any clinician to automatically identify organ structures i.e. kidneys and their possible abnormalities. The system also supports the transmission of these “Templates” to another expert for a second opinion. A 3D model of the body, the organs and their possible pathology with real metrics is also integrated. The automatic functionality was tested on eleven MRI datasets (comprising of 286 images) and the 3D models were validated by comparing them with the metrics from the corresponding “3D Golden Standard Models”. The system provides metrics for the evaluation of the results, in terms of Accuracy, Precision, Sensitivity, Specificity and Dice Similarity Coefficient (DSC) so as to enable benchmarking of its performance. The first IAD prototype has produced promising results as its performance accuracy based on the most widely deployed evaluation metric, DSC, yields 97% for the recognition of kidneys and 96% for their abnormalities; whilst across all the above evaluation metrics its performance ranges between 96% and 100%. Further development of the IAD system is in progress to extend and evaluate its clinical diagnostic support capability through development and integration of additional algorithms to offer fully computer-aided identification of other organs and their abnormalities based on CT/MRI/Ultra-sound Imaging.
Resumo:
This paper examines the impact of changes in the composition of real estate stock indices, considering companies both joining and leaving the indices. Stocks that are newly included not only see a short-term increase in their share price, but trading volumes increase in a permanent fashion following the event. This highlights the importance of indices in not only a benchmarking context but also in enhancing investor awareness and aiding liquidity. By contrast, as anticipated, the share prices of firms removed from indices fall around the time of the index change. The fact that the changes in share prices, either upwards for index inclusions or downwards for deletions, are generally not reversed, would indicate that the movements are not purely due to price pressure, but rather are more consistent with the information content hypothesis. There is no evidence, however, that index changes significantly affect the volatility of price changes or their operating performances as measured by their earnings per share.
Resumo:
Much UK research and market practice on portfolio strategy and performance benchmarking relies on a sector‐geography subdivision of properties. Prior tests of the appropriateness of such divisions have generally relied on aggregated or hypothetical return data. However, the results found in aggregate may not hold when individual buildings are considered. This paper makes use of a dataset of individual UK property returns. A series of multivariate exploratory statistical techniques are utilised to test whether the return behaviour of individual properties conforms to their a priori grouping. The results suggest strongly that neither standard sector nor regional classifications provide a clear demarcation of individual building performance. This has important implications for both portfolio strategy and performance measurement and benchmarking. However, there do appear to be size and yield effects that help explain return behaviour at the property level.
Resumo:
Recently, the original benchmarking methodology of the Sustainable Value approach became subjected to serious debate. While Kuosmanen and Kuosmanen (2009b) critically question its validity introducing productive efficiency theory, Figge and Hahn (2009) put forward that the implementation of productive efficiency theory severely conflicts with the original financial economics perspective of the Sustainable Value approach. We argue that the debate is very confusing because the original Sustainable Value approach presents two largely incompatible objectives. Nevertheless, we maintain that both ways of benchmarking could provide useful and moreover complementary insights. If one intends to present the overall resource efficiency of the firm from the investor's viewpoint, we recommend the original benchmarking methodology. If one on the other hand aspires to create a prescriptive tool setting up some sort of reallocation scheme, we advocate implementation of the productive efficiency theory. Although the discussion on benchmark application is certainly substantial, we should avoid the debate to become accordingly narrowed. Next to the benchmark concern, we see several other challenges considering the development of the Sustainable Value approach: (1) a more systematic resource selection, (2) the inclusion of the value chain and (3) additional analyses related to policy in order to increase interpretative power.
Implication of methodological uncertainties for mid-Holocene sea surface temperature reconstructions
Resumo:
We present and examine a multi-sensor global compilation of mid-Holocene (MH) sea surface temperatures (SST), based on Mg/Ca and alkenone palaeothermometry and reconstructions obtained using planktonic foraminifera and organic-walled dinoflagellate cyst census counts. We assess the uncertainties originating from using different methodologies and evaluate the potential of MH SST reconstructions as a benchmark for climate-model simulations. The comparison between different analytical approaches (time frame, baseline climate) shows the choice of time window for the MH has a negligible effect on the reconstructed SST pattern, but the choice of baseline climate affects both the magnitude and spatial pattern of the reconstructed SSTs. Comparison of the SST reconstructions made using different sensors shows significant discrepancies at a regional scale, with uncertainties often exceeding the reconstructed SST anomaly. Apparent patterns in SST may largely be a reflection of the use of different sensors in different regions. Overall, the uncertainties associated with the SST reconstructions are generally larger than the MH anomalies. Thus, the SST data currently available cannot serve as a target for benchmarking model simulations. Further evaluations of potential subsurface and/or seasonal artifacts that may contribute to obscure the MH SST reconstructions are urgently needed to provide reliable benchmarks for model evaluations.
Resumo:
The assessment of chess players is an increasingly attractive opportunity and an unfortunate necessity. The chess community needs to limit potential reputational damage by inhibiting cheating and unjustified accusations of cheating: there has been a recent rise in both. A number of counter-intuitive discoveries have been made by benchmarking the intrinsic merit of players’ moves: these call for further investigation. Is Capablanca actually, objectively the most accurate World Champion? Has ELO rating inflation not taken place? Stimulated by FIDE/ACP, we revisit the fundamentals of the subject to advance a framework suitable for improved standards of computational experiment and more precise results. Other domains look to chess as the demonstrator of good practice, including the rating of professionals making high-value decisions under pressure, personnel evaluation by Multichoice Assessment and the organization of crowd-sourcing in citizen science projects. The ‘3P’ themes of performance, prediction and profiling pervade all these domains.