753 resultados para Evaluation metrics


Relevância:

70.00% 70.00%

Publicador:

Resumo:

Cross-Lingual Link Discovery (CLLD) is a new problem in Information Retrieval. The aim is to automatically identify meaningful and relevant hypertext links between documents in different languages. This is particularly helpful in knowledge discovery if a multi-lingual knowledge base is sparse in one language or another, or the topical coverage in each language is different; such is the case with Wikipedia. Techniques for identifying new and topically relevant cross-lingual links are a current topic of interest at NTCIR where the CrossLink task has been running since the 2011 NTCIR-9. This paper presents the evaluation framework for benchmarking algorithms for cross-lingual link discovery evaluated in the context of NTCIR-9. This framework includes topics, document collections, assessments, metrics, and a toolkit for pooling, assessment, and evaluation. The assessments are further divided into two separate sets: manual assessments performed by human assessors; and automatic assessments based on links extracted from Wikipedia itself. Using this framework we show that manual assessment is more robust than automatic assessment in the context of cross-lingual link discovery.

Relevância:

70.00% 70.00%

Publicador:

Resumo:

This study aimed to provide a detailed evaluation and comparison of a range of modulated beam evaluation metrics, in terms of their correlation with QA testing results and their variation between treatment sites, for a large number of treatments. Ten metrics including the modulation index (MI), fluence map complexity (FMC), modulation complexity score (MCS), mean aperture displacement (MAD) and small aperture score (SAS) were evaluated for 546 beams from 122 IMRT and VMAT treatment plans targeting the anus, rectum, endometrium, brain, head and neck and prostate. The calculated sets of metrics were evaluated in terms of their relationships to each other and their correlation with the results of electronic portal imaging based quality assurance (QA) evaluations of the treatment beams. Evaluation of the MI, MAD and SAS suggested that beams used in treatments of the anus, rectum, head and neck were more complex than the prostate and brain treatment beams. Seven of the ten beam complexity metrics were found to be strongly correlated with the results from QA testing of the IMRT beams (p < 0.00008). For example, Values of SAS (with MLC apertures narrower than 10 mm defined as “small”) less than 0.2 also identified QA passing IMRT beams with 100% specificity. However, few of the metrics are correlated with the results from QA testing of the VMAT beams, whether they were evaluated as whole 360◦ arcs or as 60◦ sub-arcs. Select evaluation of beam complexity metrics (at least MI, MCS and SAS) is therefore recommended, as an intermediate step in the IMRT QA chain. Such evaluation may also be useful as a means of periodically reviewing VMAT planning or optimiser performance.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents an object tracking system that utilises a hybrid multi-layer motion segmentation and optical flow algorithm. While many tracking systems seek to combine multiple modalities such as motion and depth or multiple inputs within a fusion system to improve tracking robustness, current systems have avoided the combination of motion and optical flow. This combination allows the use of multiple modes within the object detection stage. Consequently, different categories of objects, within motion or stationary, can be effectively detected utilising either optical flow, static foreground or active foreground information. The proposed system is evaluated using the ETISEO database and evaluation metrics and compared to a baseline system utilising a single mode foreground segmentation technique. Results demonstrate a significant improvement in tracking results can be made through the incorporation of the additional motion information.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper presents an overview of NTCIR-9 Cross-lingual Link Discovery (Crosslink) task. The overview includes: the motivation of cross-lingual link discovery; the Crosslink task definition; the run submission specification; the assessment and evaluation framework; the evaluation metrics; and the evaluation results of submitted runs. Cross-lingual link discovery (CLLD) is a way of automatically finding potential links between documents in different languages. The goal of this task is to create a reusable resource for evaluating automated CLLD approaches. The results of this research can be used in building and refining systems for automated link discovery. The task is focused on linking between English source documents and Chinese, Korean, and Japanese target documents.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

This paper describes the evaluation in benchmarking the effectiveness of cross-lingual link discovery (CLLD). Cross lingual link discovery is a way of automatically finding prospective links between documents in different languages, which is particularly helpful for knowledge discovery of different language domains. A CLLD evaluation framework is proposed for system performance benchmarking. The framework includes standard document collections, evaluation metrics, and link assessment and evaluation tools. The evaluation methods described in this paper have been utilised to quantify the system performance at NTCIR-9 Crosslink task. It is shown that using the manual assessment for generating gold standard can deliver a more reliable evaluation result.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

Nowadays people heavily rely on the Internet for information and knowledge. Wikipedia is an online multilingual encyclopaedia that contains a very large number of detailed articles covering most written languages. It is often considered to be a treasury of human knowledge. It includes extensive hypertext links between documents of the same language for easy navigation. However, the pages in different languages are rarely cross-linked except for direct equivalent pages on the same subject in different languages. This could pose serious difficulties to users seeking information or knowledge from different lingual sources, or where there is no equivalent page in one language or another. In this thesis, a new information retrieval task—cross-lingual link discovery (CLLD) is proposed to tackle the problem of the lack of cross-lingual anchored links in a knowledge base such as Wikipedia. In contrast to traditional information retrieval tasks, cross language link discovery algorithms actively recommend a set of meaningful anchors in a source document and establish links to documents in an alternative language. In other words, cross-lingual link discovery is a way of automatically finding hypertext links between documents in different languages, which is particularly helpful for knowledge discovery in different language domains. This study is specifically focused on Chinese / English link discovery (C/ELD). Chinese / English link discovery is a special case of cross-lingual link discovery task. It involves tasks including natural language processing (NLP), cross-lingual information retrieval (CLIR) and cross-lingual link discovery. To justify the effectiveness of CLLD, a standard evaluation framework is also proposed. The evaluation framework includes topics, document collections, a gold standard dataset, evaluation metrics, and toolkits for run pooling, link assessment and system evaluation. With the evaluation framework, performance of CLLD approaches and systems can be quantified. This thesis contributes to the research on natural language processing and cross-lingual information retrieval in CLLD: 1) a new simple, but effective Chinese segmentation method, n-gram mutual information, is presented for determining the boundaries of Chinese text; 2) a voting mechanism of name entity translation is demonstrated for achieving a high precision of English / Chinese machine translation; 3) a link mining approach that mines the existing link structure for anchor probabilities achieves encouraging results in suggesting cross-lingual Chinese / English links in Wikipedia. This approach was examined in the experiments for better, automatic generation of cross-lingual links that were carried out as part of the study. The overall major contribution of this thesis is the provision of a standard evaluation framework for cross-lingual link discovery research. It is important in CLLD evaluation to have this framework which helps in benchmarking the performance of various CLLD systems and in identifying good CLLD realisation approaches. The evaluation methods and the evaluation framework described in this thesis have been utilised to quantify the system performance in the NTCIR-9 Crosslink task which is the first information retrieval track of this kind.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper proposes an experimental study of quality metrics that can be applied to visual and infrared images acquired from cameras onboard an unmanned ground vehicle (UGV). The relevance of existing metrics in this context is discussed and a novel metric is introduced. Selected metrics are evaluated on data collected by a UGV in clear and challenging environmental conditions, represented in this paper by the presence of airborne dust or smoke. An example of application is given with monocular SLAM estimating the pose of the UGV while smoke is present in the environment. It is shown that the proposed novel quality metric can be used to anticipate situations where the quality of the pose estimate will be significantly degraded due to the input image data. This leads to decisions of advantageously switching between data sources (e.g. using infrared images instead of visual images).

Relevância:

40.00% 40.00%

Publicador:

Resumo:

This paper proposes an experimental study of quality metrics that can be applied to visual and infrared images acquired from cameras onboard an unmanned ground vehicle (UGV). The relevance of existing metrics in this context is discussed and a novel metric is introduced. Selected metrics are evaluated on data collected by a UGV in clear and challenging environmental conditions, represented in this paper by the presence of airborne dust or smoke.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Measuring social and environmental metrics of property is necessary for meaningful triple bottom line (TBL) assessments. This paper demonstrates how relevant indicators derived from environmental rating systems provide for reasonably straightforward collations of performance scores that support adjustments based on a sliding scale. It also highlights the absence of a corresponding consensus of important social metrics representing the third leg of the TBL tripod. Assessing TBL may be unavoidably imprecise, but if valuers and managers continue to ignore TBL concerns, their assessments may soon be less relevant given the emerging institutional milieu informing and reflecting business practices and society expectations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Presents a unified and systematic assessment of ten position control strategies for a hydraulic servo system with single-ended cylinder driven by a proportional directional control valve. We aim at identifying those methods that achieve better tracking, have a low sensitivity to system uncertainties, and offer a good balance between development effort and end results. A formal approach for solving this problem relies on several practical metrics, which is introduced herein. Their choice is important, as the comparison results between controllers can vary significantly, depending on the selected criterion. Apart from the quantitative assessment, we also raise aspects which are difficult to quantify, but which must stay in attention when considering the position control problem for this class of hydraulic servo systems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Increasing global competitiveness worldwide has forced manufacturing organizations to produce high-quality products more quickly and at a competitive cost which demand of continuous improvements techniques. In this paper, we propose a fuzzy based performance evaluation method for lean supply chain. To understand the overall performance of cost competitive supply chain, we investigate the alignment of market strategy and position of the supply chain. Competitive strategies can be achieved by using a different weight calculation for different supply chain situations. By identifying optimal performance metrics and applying performance evaluation methods, managers can predict the overall supply chain performance under lean strategy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

IEEE 802.11p is the new standard for inter-vehicular communications (IVC) using the 5.9 GHz frequency band; it is planned to be widely deployed to enable cooperative systems. 802.11p uses and performance have been studied theoretically and in simulations over the past years. Unfortunately, many of these results have not been confirmed by on-tracks experimentation. In this paper, we describe field trials of 802.11p technology with our test vehicles. Metrics such as maximum range, latency and frame loss are examined.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Traffic safety studies mandate more than what existing micro-simulation models can offer as they postulate that every driver exhibits a safe behaviour. All the microscopic traffic simulation models are consisting of a car-following model and the Gazis–Herman–Rothery (GHR) car-following model is a widely used model. This paper highlights the limitations of the GHR car-following model capability to model longitudinal driving behaviour for safety study purposes. This study reviews and compares different version of the GHR model. To empower the GHR model on precise metrics reproduction a new set of car-following model parameters is offered to simulate unsafe vehicle conflicts. NGSIM vehicle trajectory data is used to evaluate the new model and short following headways and Time to Collision are employed to assess critical safety events within traffic flow. Risky events are extracted from available NGSIM data to evaluate the modified model against the generic versions of the GHR model. The results from simulation tests illustrate that the proposed model does predict the safety metrics better than the generic GHR model. Additionally it can potentially facilitate assessing and predicting traffic facilities’ safety using microscopic simulation. The new model can predict Near-miss rear-end crashes.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

High Dynamic Range (HDR) imaging was used to collect luminance information at workstations in 2 open-plan office buildings in Queensland, Australia: one lit by skylights, vertical windows and electric light, and another by skylights and electric light. This paper compares illuminance and luminance data collected in these offices with occupant feedback to evaluate these open-plan environments based on available and emerging metrics for visual comfort and glare. This study highlights issues of daylighting quality and measurement specific to open plan spaces. The results demonstrate that overhead glare is a serious threat to user acceptance of skylights, and that electric and daylight integration and controls have a major impact on the perception of daylighting quality. With regards to measurement of visual comfort it was found that the Daylight Glare Probability (DGP) gave poor agreement with occupant reports of discomfort glare in open-plan spaces with skylights, and the CIE Glare Index (CGI) gave the best agreement. Horizontal and vertical illuminances gave no indication of visual comfort in these spaces.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Multiple reaction monitoring (MRM) mass spectrometry coupled with stable isotope dilution (SID) and liquid chromatography (LC) is increasingly used in biological and clinical studies for precise and reproducible quantification of peptides and proteins in complex sample matrices. Robust LC-SID-MRM-MS-based assays that can be replicated across laboratories and ultimately in clinical laboratory settings require standardized protocols to demonstrate that the analysis platforms are performing adequately. We developed a system suitability protocol (SSP), which employs a predigested mixture of six proteins, to facilitate performance evaluation of LC-SID-MRM-MS instrument platforms, configured with nanoflow-LC systems interfaced to triple quadrupole mass spectrometers. The SSP was designed for use with low multiplex analyses as well as high multiplex approaches when software-driven scheduling of data acquisition is required. Performance was assessed by monitoring of a range of chromatographic and mass spectrometric metrics including peak width, chromatographic resolution, peak capacity, and the variability in peak area and analyte retention time (RT) stability. The SSP, which was evaluated in 11 laboratories on a total of 15 different instruments, enabled early diagnoses of LC and MS anomalies that indicated suboptimal LC-MRM-MS performance. The observed range in variation of each of the metrics scrutinized serves to define the criteria for optimized LC-SID-MRM-MS platforms for routine use, with pass/fail criteria for system suitability performance measures defined as peak area coefficient of variation <0.15, peak width coefficient of variation <0.15, standard deviation of RT <0.15 min (9 s), and the RT drift <0.5min (30 s). The deleterious effect of a marginally performing LC-SID-MRM-MS system on the limit of quantification (LOQ) in targeted quantitative assays illustrates the use and need for a SSP to establish robust and reliable system performance. Use of a SSP helps to ensure that analyte quantification measurements can be replicated with good precision within and across multiple laboratories and should facilitate more widespread use of MRM-MS technology by the basic biomedical and clinical laboratory research communities.