984 resultados para Evaluation metrics for accomplishment
Resumo:
Sales prediction plays a huge role in modern business strategies. One of it's many use cases revolves around estimating the effects of promotions. While promotions generally have a positive effect on sales of the promoted product, they can also have a negative effect on those of other products. This phenomenon is calles sales cannibalisation. Sales cannibalisation can pose a big problem to sales forcasting algorithms. A lot of times, these algorithms focus on sales over time of a single product in a single store (a couple). This research focusses on using knowledge of a product across multiple different stores. To achieve this, we applied transfer learning on a neural model developed by Kantar Consulting to demo an approach to estimating the effect of cannibalisation. Our results show a performance increase of between 10 and 14 percent. This is a very good and desired result, and Kantar will use the approach when integrating this test method into their actual systems.
Resumo:
Mestrado em Contabilidade e Gestão das Instituições Financeiras
Resumo:
Issues related to association mining have received attention, especially the ones aiming to discover and facilitate the search for interesting patterns. A promising approach, in this context, is the application of clustering in the pre-processing step. In this paper, eleven metrics are proposed to provide an assessment procedure in order to support the evaluation of this kind of approach. To propose the metrics, a subjective evaluation was done. The metrics are important since they provide criteria to: (a) analyze the methodologies, (b) identify their positive and negative aspects, (c) carry out comparisons among them and, therefore, (d) help the users to select the most suitable solution for their problems. Besides, the metrics do the users think about aspects related to the problems and provide a flexible way to solve them. Some experiments were done in order to present how the metrics can be used and their usefulness.
Resumo:
Statistical machine translation (SMT) is an approach to Machine Translation (MT) that uses statistical models whose parameter estimation is based on the analysis of existing human translations (contained in bilingual corpora). From a translation student’s standpoint, this dissertation aims to explain how a phrase-based SMT system works, to determine the role of the statistical models it uses in the translation process and to assess the quality of the translations provided that system is trained with in-domain goodquality corpora. To that end, a phrase-based SMT system based on Moses has been trained and subsequently used for the English to Spanish translation of two texts related in topic to the training data. Finally, the quality of this output texts produced by the system has been assessed through a quantitative evaluation carried out with three different automatic evaluation measures and a qualitative evaluation based on the Multidimensional Quality Metrics (MQM).
Resumo:
This thesis investigates how web search evaluation can be improved using historical interaction data. Modern search engines combine offline and online evaluation approaches in a sequence of steps that a tested change needs to pass through to be accepted as an improvement and subsequently deployed. We refer to such a sequence of steps as an evaluation pipeline. In this thesis, we consider the evaluation pipeline to contain three sequential steps: an offline evaluation step, an online evaluation scheduling step, and an online evaluation step. In this thesis we show that historical user interaction data can aid in improving the accuracy or efficiency of each of the steps of the web search evaluation pipeline. As a result of these improvements, the overall efficiency of the entire evaluation pipeline is increased. Firstly, we investigate how user interaction data can be used to build accurate offline evaluation methods for query auto-completion mechanisms. We propose a family of offline evaluation metrics for query auto-completion that represents the effort the user has to spend in order to submit their query. The parameters of our proposed metrics are trained against a set of user interactions recorded in the search engine’s query logs. From our experimental study, we observe that our proposed metrics are significantly more correlated with an online user satisfaction indicator than the metrics proposed in the existing literature. Hence, fewer changes will pass the offline evaluation step to be rejected after the online evaluation step. As a result, this would allow us to achieve a higher efficiency of the entire evaluation pipeline. Secondly, we state the problem of the optimised scheduling of online experiments. We tackle this problem by considering a greedy scheduler that prioritises the evaluation queue according to the predicted likelihood of success of a particular experiment. This predictor is trained on a set of online experiments, and uses a diverse set of features to represent an online experiment. Our study demonstrates that a higher number of successful experiments per unit of time can be achieved by deploying such a scheduler on the second step of the evaluation pipeline. Consequently, we argue that the efficiency of the evaluation pipeline can be increased. Next, to improve the efficiency of the online evaluation step, we propose the Generalised Team Draft interleaving framework. Generalised Team Draft considers both the interleaving policy (how often a particular combination of results is shown) and click scoring (how important each click is) as parameters in a data-driven optimisation of the interleaving sensitivity. Further, Generalised Team Draft is applicable beyond domains with a list-based representation of results, i.e. in domains with a grid-based representation, such as image search. Our study using datasets of interleaving experiments performed both in document and image search domains demonstrates that Generalised Team Draft achieves the highest sensitivity. A higher sensitivity indicates that the interleaving experiments can be deployed for a shorter period of time or use a smaller sample of users. Importantly, Generalised Team Draft optimises the interleaving parameters w.r.t. historical interaction data recorded in the interleaving experiments. Finally, we propose to apply the sequential testing methods to reduce the mean deployment time for the interleaving experiments. We adapt two sequential tests for the interleaving experimentation. We demonstrate that one can achieve a significant decrease in experiment duration by using such sequential testing methods. The highest efficiency is achieved by the sequential tests that adjust their stopping thresholds using historical interaction data recorded in diagnostic experiments. Our further experimental study demonstrates that cumulative gains in the online experimentation efficiency can be achieved by combining the interleaving sensitivity optimisation approaches, including Generalised Team Draft, and the sequential testing approaches. Overall, the central contributions of this thesis are the proposed approaches to improve the accuracy or efficiency of the steps of the evaluation pipeline: the offline evaluation frameworks for the query auto-completion, an approach for the optimised scheduling of online experiments, a general framework for the efficient online interleaving evaluation, and a sequential testing approach for the online search evaluation. The experiments in this thesis are based on massive real-life datasets obtained from Yandex, a leading commercial search engine. These experiments demonstrate the potential of the proposed approaches to improve the efficiency of the evaluation pipeline.
Resumo:
This dissertation applies statistical methods to the evaluation of automatic summarization using data from the Text Analysis Conferences in 2008-2011. Several aspects of the evaluation framework itself are studied, including the statistical testing used to determine significant differences, the assessors, and the design of the experiment. In addition, a family of evaluation metrics is developed to predict the score an automatically generated summary would receive from a human judge and its results are demonstrated at the Text Analysis Conference. Finally, variations on the evaluation framework are studied and their relative merits considered. An over-arching theme of this dissertation is the application of standard statistical methods to data that does not conform to the usual testing assumptions.
Resumo:
Dissertação apresentada à Escola Superior de Educação de Lisboa para obtenção do grau de Mestre em Ciências da Educação - Especialização em Supervisão em Educação
Resumo:
The Darwinian Particle Swarm Optimization (DPSO) is an evolutionary algorithm that extends the Particle Swarm Optimization using natural selection to enhance the ability to escape from sub-optimal solutions. An extension of the DPSO to multi-robot applications has been recently proposed and denoted as Robotic Darwinian PSO (RDPSO), benefiting from the dynamical partitioning of the whole population of robots, hence decreasing the amount of required information exchange among robots. This paper further extends the previously proposed algorithm adapting the behavior of robots based on a set of context-based evaluation metrics. Those metrics are then used as inputs of a fuzzy system so as to systematically adjust the RDPSO parameters (i.e., outputs of the fuzzy system), thus improving its convergence rate, susceptibility to obstacles and communication constraints. The adapted RDPSO is evaluated in groups of physical robots, being further explored using larger populations of simulated mobile robots within a larger scenario.
Resumo:
Dissertation to Obtain the Degree of Master in Biomedical Engineering
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Biomédica
Resumo:
Eradication of code smells is often pointed out as a way to improve readability, extensibility and design in existing software. However, code smell detection remains time consuming and error-prone, partly due to the inherent subjectivity of the detection processes presently available. In view of mitigating the subjectivity problem, this dissertation presents a tool that automates a technique for the detection and assessment of code smells in Java source code, developed as an Eclipse plugin. The technique is based upon a Binary Logistic Regression model that uses complexity metrics as independent variables and is calibrated by expert‟s knowledge. An overview of the technique is provided, the tool is described and validated by an example case study.
Resumo:
Aim: Climatic niche modelling of species and community distributions implicitly assumes strong and constant climatic determinism across geographic space. This assumption had however never been tested so far. We tested it by assessing how stacked-species distribution models (S-SDMs) perform for predicting plant species assemblages along elevation. Location: Western Swiss Alps. Methods: Using robust presence-absence data, we first assessed the ability of topo-climatic S-SDMs to predict plant assemblages in a study area encompassing a 2800 m wide elevation gradient. We then assessed the relationships among several evaluation metrics and trait-based tests of community assembly rules. Results: The standard errors of individual SDMs decreased significantly towards higher elevations. Overall, the S-SDM overpredicted far more than they underpredicted richness and could not reproduce the humpback curve along elevation. Overprediction was greater at low and mid-range elevations in absolute values but greater at high elevations when standardised by the actual richness. Looking at species composition, the evaluation metrics accounting for both the presence and absence of species (overall prediction success and kappa) or focusing on correctly predicted absences (specificity) increased with increasing elevation, while the metrics focusing on correctly predicted presences (Jaccard index and sensitivity) decreased. The best overall evaluation - as driven by specificity - occurred at high elevation where species assemblages were shown to be under significant environmental filtering of small plants. In contrast, the decreased overall accuracy in the lowlands was associated with functional patterns representing any type of assembly rule (environmental filtering, limiting similarity or null assembly). Main Conclusions: Our study reveals interesting patterns of change in S-SDM errors with changes in assembly rules along elevation. Yet, significant levels of assemblage prediction errors occurred throughout the gradient, calling for further improvement of SDMs, e.g., by adding key environmental filters that act at fine scales and developing approaches to account for variations in the influence of predictors along environmental gradients.
Resumo:
Traditionally, the Iowa Department of Transportation has used the Iowa Runoff Chart and single-variable regional-regression equations (RREs) from a U.S. Geological Survey report (published in 1987) as the primary methods to estimate annual exceedance-probability discharge (AEPD) for small (20 square miles or less) drainage basins in Iowa. With the publication of new multi- and single-variable RREs by the U.S. Geological Survey (published in 2013), the Iowa Department of Transportation needs to determine which methods of AEPD estimation provide the best accuracy and the least bias for small drainage basins in Iowa. Twenty five streamgages with drainage areas less than 2 square miles (mi2) and 55 streamgages with drainage areas between 2 and 20 mi2 were selected for the comparisons that used two evaluation metrics. Estimates of AEPDs calculated for the streamgages using the expected moments algorithm/multiple Grubbs-Beck test analysis method were compared to estimates of AEPDs calculated from the 2013 multivariable RREs; the 2013 single-variable RREs; the 1987 single-variable RREs; the TR-55 rainfall-runoff model; and the Iowa Runoff Chart. For the 25 streamgages with drainage areas less than 2 mi2, results of the comparisons seem to indicate the best overall accuracy and the least bias may be achieved by using the TR-55 method for flood regions 1 and 3 (published in 2013) and by using the 1987 single-variable RREs for flood region 2 (published in 2013). For drainage basins with areas between 2 and 20 mi2, results of the comparisons seem to indicate the best overall accuracy and the least bias may be achieved by using the 1987 single-variable RREs for the Southern Iowa Drift Plain landform region and for flood region 3 (published in 2013), by using the 2013 multivariable RREs for the Iowan Surface landform region, and by using the 2013 or 1987 single-variable RREs for flood region 2 (published in 2013). For all other landform or flood regions in Iowa, use of the 2013 single-variable RREs may provide the best overall accuracy and the least bias. An examination was conducted to understand why the 1987 single-variable RREs seem to provide better accuracy and less bias than either of the 2013 multi- or single-variable RREs. A comparison of 1-percent annual exceedance-probability regression lines for hydrologic regions 1–4 from the 1987 single-variable RREs and for flood regions 1–3 from the 2013 single-variable RREs indicates that the 1987 single-variable regional-regression lines generally have steeper slopes and lower discharges when compared to 2013 single-variable regional-regression lines for corresponding areas of Iowa. The combination of the definition of hydrologic regions, the lower discharges, and the steeper slopes of regression lines associated with the 1987 single-variable RREs seem to provide better accuracy and less bias when compared to the 2013 multi- or single-variable RREs; better accuracy and less bias was determined particularly for drainage areas less than 2 mi2, and also for some drainage areas between 2 and 20 mi2. The 2013 multi- and single-variable RREs are considered to provide better accuracy and less bias for larger drainage areas. Results of this study indicate that additional research is needed to address the curvilinear relation between drainage area and AEPDs for areas of Iowa.
Resumo:
IT-järjestelmillä on tärkeä rooli organisaation liiketoiminnassa. Koska organisaation liiketoimintavaatimukset ja strategia muuttuvat ympäröivän maailman mukaan, täytyy järjestelmän arkkitehtuurin sopeutua vallitsevaan tilanteeseen sekä mahdollisiin muutoksiin lyhyellä ja pitkällä aikavälillä. Modernin web-sovelluksen arkkitehtuuri sopeutuu organisaation liiketoiminnan haasteisiin. Erityisesti hallinnolliseksi ongelmaksi organisaatiossa muodostuvat Windows-sovellukset, koska niiden ylläpito sitoo henkilöresursseja ja niiden käyttökonteksti on rajallinen. Tästä syystä organisaatiot ovat käyneet etsimään ratkaisuja kuinka korvata Windows-sovellukset web-sovelluksilla. Kustannustehokas ratkaisu on modernisoida Windows-sovelluksen käyttöliittymä web-sovellukseksi. Tämän diplomityön tavoitteena oli laatia Logica Suomi Oy yritykselle viitearkkitehtuuri Win-dows-sovelluksen käyttöliittymän modernisoimiseksi web-sovellukseksi. Työ suoritettiin Proof of Concept projektissa, jossa modernisointiin Logican pääkäyttäjäsovellus. Työn tarkoituksena oli tunnistaa laajalti käytetyt arkkitehtuurimallit ja menetelmät jotka mahdollistavat modernisoinnin toteutuksen. Lisäksi tarkoitus oli tunnistaa menetelmät ja ohjelmistot jotka mahdollistavat kustannustehokkaan ja laadukkaan web-sovelluksen kehittämisen ja toteuttamisen. Työn osatavoitteena oli laatia modernisoitavan pääkäyttäjäsovelluksen kokonaisarkkitehtuuri. Työn tuloksena saatiin viitearkkitehtuuri jota voidaan käyttää ja hyödyntää ohjelmistokehitysprojekteissa, asiakkaan dokumentaatiossa, myynnissä ja markkinoinnissa. Viitearkkitehtuurissa on esitelty modernit web-teknologiat joilla on mahdollista toteuttaa web-sovellus jonka käyttökokemus vastaa Windows-sovellusta. Lisäksi tuloksena saatiin pääkäyttäjäsovelluksen kokonaisarkkitehtuuri, jonka tärkeimpiä tuloksia ovat modernisoinnin tavoitetila ja sovellusarkkitehtuuri. Tärkeimpiä jatkotoimenpiteitä ovat viitearkkitehtuuriin pohjautuvan modernisointiviitekehyksen laadinta sekä modernisointiprojektin arviointiin käytettävien mittareiden määrittely, suunnittelu ja toteutus. Relevanttien mittareiden avulla voidaan todeta, vastaako modernisoitu sovellus organisaation liiketoimintavaatimuksia ja strategiaa.
Resumo:
The vast majority of our contemporary society owns a mobile phone, which has resulted in a dramatic rise in the amount of networked computers in recent years. Security issues in the computers have followed the same trend and nearly everyone is now affected by such issues. How could the situation be improved? For software engineers, an obvious answer is to build computer software with security in mind. A problem with building software with security is how to define secure software or how to measure security. This thesis divides the problem into three research questions. First, how can we measure the security of software? Second, what types of tools are available for measuring security? And finally, what do these tools reveal about the security of software? Measuring tools of these kind are commonly called metrics. This thesis is focused on the perspective of software engineers in the software design phase. Focus on the design phase means that code level semantics or programming language specifics are not discussed in this work. Organizational policy, management issues or software development process are also out of the scope. The first two research problems were studied using a literature review while the third was studied using a case study research. The target of the case study was a Java based email server called Apache James, which had details from its changelog and security issues available and the source code was accessible. The research revealed that there is a consensus in the terminology on software security. Security verification activities are commonly divided into evaluation and assurance. The focus of this work was in assurance, which means to verify one’s own work. There are 34 metrics available for security measurements, of which five are evaluation metrics and 29 are assurance metrics. We found, however, that the general quality of these metrics was not good. Only three metrics in the design category passed the inspection criteria and could be used in the case study. The metrics claim to give quantitative information on the security of the software, but in practice they were limited to evaluating different versions of the same software. Apart from being relative, the metrics were unable to detect security issues or point out problems in the design. Furthermore, interpreting the metrics’ results was difficult. In conclusion, the general state of the software security metrics leaves a lot to be desired. The metrics studied had both theoretical and practical issues, and are not suitable for daily engineering workflows. The metrics studied provided a basis for further research, since they pointed out areas where the security metrics were necessary to improve whether verification of security from the design was desired.