966 resultados para clustering quality metrics


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Los avances en el hardware permiten disponer de grandes volúmenes de datos, surgiendo aplicaciones que deben suministrar información en tiempo cuasi-real, la monitorización de pacientes, ej., el seguimiento sanitario de las conducciones de agua, etc. Las necesidades de estas aplicaciones hacen emerger el modelo de flujo de datos (data streaming) frente al modelo almacenar-para-despuésprocesar (store-then-process). Mientras que en el modelo store-then-process, los datos son almacenados para ser posteriormente consultados; en los sistemas de streaming, los datos son procesados a su llegada al sistema, produciendo respuestas continuas sin llegar a almacenarse. Esta nueva visión impone desafíos para el procesamiento de datos al vuelo: 1) las respuestas deben producirse de manera continua cada vez que nuevos datos llegan al sistema; 2) los datos son accedidos solo una vez y, generalmente, no son almacenados en su totalidad; y 3) el tiempo de procesamiento por dato para producir una respuesta debe ser bajo. Aunque existen dos modelos para el cómputo de respuestas continuas, el modelo evolutivo y el de ventana deslizante; éste segundo se ajusta mejor en ciertas aplicaciones al considerar únicamente los datos recibidos más recientemente, en lugar de todo el histórico de datos. En los últimos años, la minería de datos en streaming se ha centrado en el modelo evolutivo. Mientras que, en el modelo de ventana deslizante, el trabajo presentado es más reducido ya que estos algoritmos no sólo deben de ser incrementales si no que deben borrar la información que caduca por el deslizamiento de la ventana manteniendo los anteriores tres desafíos. Una de las tareas fundamentales en minería de datos es la búsqueda de agrupaciones donde, dado un conjunto de datos, el objetivo es encontrar grupos representativos, de manera que se tenga una descripción sintética del conjunto. Estas agrupaciones son fundamentales en aplicaciones como la detección de intrusos en la red o la segmentación de clientes en el marketing y la publicidad. Debido a las cantidades masivas de datos que deben procesarse en este tipo de aplicaciones (millones de eventos por segundo), las soluciones centralizadas puede ser incapaz de hacer frente a las restricciones de tiempo de procesamiento, por lo que deben recurrir a descartar datos durante los picos de carga. Para evitar esta perdida de datos, se impone el procesamiento distribuido de streams, en concreto, los algoritmos de agrupamiento deben ser adaptados para este tipo de entornos, en los que los datos están distribuidos. En streaming, la investigación no solo se centra en el diseño para tareas generales, como la agrupación, sino también en la búsqueda de nuevos enfoques que se adapten mejor a escenarios particulares. Como ejemplo, un mecanismo de agrupación ad-hoc resulta ser más adecuado para la defensa contra la denegación de servicio distribuida (Distributed Denial of Services, DDoS) que el problema tradicional de k-medias. En esta tesis se pretende contribuir en el problema agrupamiento en streaming tanto en entornos centralizados y distribuidos. Hemos diseñado un algoritmo centralizado de clustering mostrando las capacidades para descubrir agrupaciones de alta calidad en bajo tiempo frente a otras soluciones del estado del arte, en una amplia evaluación. Además, se ha trabajado sobre una estructura que reduce notablemente el espacio de memoria necesario, controlando, en todo momento, el error de los cómputos. Nuestro trabajo también proporciona dos protocolos de distribución del cómputo de agrupaciones. Se han analizado dos características fundamentales: el impacto sobre la calidad del clustering al realizar el cómputo distribuido y las condiciones necesarias para la reducción del tiempo de procesamiento frente a la solución centralizada. Finalmente, hemos desarrollado un entorno para la detección de ataques DDoS basado en agrupaciones. En este último caso, se ha caracterizado el tipo de ataques detectados y se ha desarrollado una evaluación sobre la eficiencia y eficacia de la mitigación del impacto del ataque. ABSTRACT Advances in hardware allow to collect huge volumes of data emerging applications that must provide information in near-real time, e.g., patient monitoring, health monitoring of water pipes, etc. The data streaming model emerges to comply with these applications overcoming the traditional store-then-process model. With the store-then-process model, data is stored before being consulted; while, in streaming, data are processed on the fly producing continuous responses. The challenges of streaming for processing data on the fly are the following: 1) responses must be produced continuously whenever new data arrives in the system; 2) data is accessed only once and is generally not maintained in its entirety, and 3) data processing time to produce a response should be low. Two models exist to compute continuous responses: the evolving model and the sliding window model; the latter fits best with applications must be computed over the most recently data rather than all the previous data. In recent years, research in the context of data stream mining has focused mainly on the evolving model. In the sliding window model, the work presented is smaller since these algorithms must be incremental and they must delete the information which expires when the window slides. Clustering is one of the fundamental techniques of data mining and is used to analyze data sets in order to find representative groups that provide a concise description of the data being processed. Clustering is critical in applications such as network intrusion detection or customer segmentation in marketing and advertising. Due to the huge amount of data that must be processed by such applications (up to millions of events per second), centralized solutions are usually unable to cope with timing restrictions and recur to shedding techniques where data is discarded during load peaks. To avoid discarding of data, processing of streams (such as clustering) must be distributed and adapted to environments where information is distributed. In streaming, research does not only focus on designing for general tasks, such as clustering, but also in finding new approaches that fit bests with particular scenarios. As an example, an ad-hoc grouping mechanism turns out to be more adequate than k-means for defense against Distributed Denial of Service (DDoS). This thesis contributes to the data stream mining clustering technique both for centralized and distributed environments. We present a centralized clustering algorithm showing capabilities to discover clusters of high quality in low time and we provide a comparison with existing state of the art solutions. We have worked on a data structure that significantly reduces memory requirements while controlling the error of the clusters statistics. We also provide two distributed clustering protocols. We focus on the analysis of two key features: the impact on the clustering quality when computation is distributed and the requirements for reducing the processing time compared to the centralized solution. Finally, with respect to ad-hoc grouping techniques, we have developed a DDoS detection framework based on clustering.We have characterized the attacks detected and we have evaluated the efficiency and effectiveness of mitigating the attack impact.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Systematic evaluation of Learning Objects is essential to make high quality Web-based education possible. For this reason, several educational repositories and e-Learning systems have developed their own evaluation models and tools. However, the differences of the context in which Learning Objects are produced and consumed suggest that no single evaluation model is sufficient for all scenarios. Besides, no much effort has been put in developing open tools to facilitate Learning Object evaluation and use the quality information for the benefit of end users. This paper presents LOEP, an open source web platform that aims to facilitate Learning Object evaluation in different scenarios and educational settings by supporting and integrating several evaluation models and quality metrics. The work exposed in this paper shows that LOEP is capable of providing Learning Object evaluation to e-Learning systems in an open, low cost, reliable and effective way. Possible scenarios where LOEP could be used to implement quality control policies and to enhance search engines are also described. Finally, we report the results of a survey conducted among reviewers that used LOEP, showing that they perceived LOEP as a powerful and easy to use tool for evaluating Learning Objects.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

LHE (logarithmical hopping encoding) is a computationally efficient image compression algorithm that exploits the Weber–Fechner law to encode the error between colour component predictions and the actual value of such components. More concretely, for each pixel, luminance and chrominance predictions are calculated as a function of the surrounding pixels and then the error between the predictions and the actual values are logarithmically quantised. The main advantage of LHE is that although it is capable of achieving a low-bit rate encoding with high quality results in terms of peak signal-to-noise ratio (PSNR) and image quality metrics with full-reference (FSIM) and non-reference (blind/referenceless image spatial quality evaluator), its time complexity is O( n) and its memory complexity is O(1). Furthermore, an enhanced version of the algorithm is proposed, where the output codes provided by the logarithmical quantiser are used in a pre-processing stage to estimate the perceptual relevance of the image blocks. This allows the algorithm to downsample the blocks with low perceptual relevance, thus improving the compression rate. The performance of LHE is especially remarkable when the bit per pixel rate is low, showing much better quality, in terms of PSNR and FSIM, than JPEG and slightly lower quality than JPEG-2000 but being more computationally efficient.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Statistical machine translation (SMT) is an approach to Machine Translation (MT) that uses statistical models whose parameter estimation is based on the analysis of existing human translations (contained in bilingual corpora). From a translation student’s standpoint, this dissertation aims to explain how a phrase-based SMT system works, to determine the role of the statistical models it uses in the translation process and to assess the quality of the translations provided that system is trained with in-domain goodquality corpora. To that end, a phrase-based SMT system based on Moses has been trained and subsequently used for the English to Spanish translation of two texts related in topic to the training data. Finally, the quality of this output texts produced by the system has been assessed through a quantitative evaluation carried out with three different automatic evaluation measures and a qualitative evaluation based on the Multidimensional Quality Metrics (MQM).

Relevância:

80.00% 80.00%

Publicador:

Resumo:

There is growing interest in the use of context-awareness as a technique for developing pervasive computing applications that are flexible, adaptable, and capable of acting autonomously on behalf of users. However, context-awareness introduces a variety of software engineering challenges. In this paper, we address these challenges by proposing a set of conceptual models designed to support the software engineering process, including context modelling techniques, a preference model for representing context-dependent requirements, and two programming models. We also present a software infrastructure and software engineering process that can be used in conjunction with our models. Finally, we discuss a case study that demonstrates the strengths of our models and software engineering approach with respect to a set of software quality metrics.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Purpose – The purpose of this paper is to investigate how research and development (R&D) collaboration takes place for complex new products in the automotive sector. The research aims to give guidelines to increase the effectiveness of such collaborations. Design/methodology/approach – The methodology used to investigate this issue was grounded theory. The empirical data were collected through a mixture of interviews and questionnaires. The resulting inducted conceptual models were subsequently validated in industrial workshops. Findings – The findings show that frontloading of the collaborative members was a major issue in managing successful R&D collaborations. Research limitations/implications – The limitation of this research is that it is only based in the German automotive industry. Practical implications – Practical implications have come out of this research. Models and guidelines are given to help make a success of collaborative projects and their potential impacts on time, cost and quality metrics. Originality/value – Frontloading is not often studied in a collaborative manner; it is normally studied within just one organisation. This study has novel value because it has involved a number of different members throughout the supplier network.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The practice of evaluating faculty and business schools based on their journal publications has increased the emphasis on research output in peer reviewed journals. Since journal standings are a frequently debated issue, this study seeks to examine the perceptual differences of journals between different segments of marketing academics. Based on a worldwide online survey, journals are assessed in terms of four subjective quality metrics: journal familiarity, average rank position, percent of respondents who classify a journal as top tier, and readership. It is demonstrated that an individual's geographic origin, research interests or journal affiliation can have a significant impact on journal rankings.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Since researchers and academic institutions are increasingly evaluated based on their publication record in peer reviewed journals, it is crucial to assess how the statistics community perceives statistics journals. This study presents four subjective quality metrics of statistics journals as expressed by different segments of statisticians. Based on a worldwide sample of 2,190 statisticians, our findings indicate that the research interest and geographic origin of the researcher have a significant impact on journal perceptions, which are highly correlated with a journal's total number of citations.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Because poor quality semantic metadata can destroy the effectiveness of semantic web technology by hampering applications from producing accurate results, it is important to have frameworks that support their evaluation. However, there is no such framework developedto date. In this context, we proposed i) an evaluation reference model, SemRef, which sketches some fundamental principles for evaluating semantic metadata, and ii) an evaluation framework, SemEval, which provides a set of instruments to support the detection of quality problems and the collection of quality metrics for these problems. A preliminary case study of SemEval shows encouraging results.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Analysing the molecular polymorphism and interactions of DNA, RNA and proteins is of fundamental importance in biology. Predicting functions of polymorphic molecules is important in order to design more effective medicines. Analysing major histocompatibility complex (MHC) polymorphism is important for mate choice, epitope-based vaccine design and transplantation rejection etc. Most of the existing exploratory approaches cannot analyse these datasets because of the large number of molecules with a high number of descriptors per molecule. This thesis develops novel methods for data projection in order to explore high dimensional biological dataset by visualising them in a low-dimensional space. With increasing dimensionality, some existing data visualisation methods such as generative topographic mapping (GTM) become computationally intractable. We propose variants of these methods, where we use log-transformations at certain steps of expectation maximisation (EM) based parameter learning process, to make them tractable for high-dimensional datasets. We demonstrate these proposed variants both for synthetic and electrostatic potential dataset of MHC class-I. We also propose to extend a latent trait model (LTM), suitable for visualising high dimensional discrete data, to simultaneously estimate feature saliency as an integrated part of the parameter learning process of a visualisation model. This LTM variant not only gives better visualisation by modifying the project map based on feature relevance, but also helps users to assess the significance of each feature. Another problem which is not addressed much in the literature is the visualisation of mixed-type data. We propose to combine GTM and LTM in a principled way where appropriate noise models are used for each type of data in order to visualise mixed-type data in a single plot. We call this model a generalised GTM (GGTM). We also propose to extend GGTM model to estimate feature saliencies while training a visualisation model and this is called GGTM with feature saliency (GGTM-FS). We demonstrate effectiveness of these proposed models both for synthetic and real datasets. We evaluate visualisation quality using quality metrics such as distance distortion measure and rank based measures: trustworthiness, continuity, mean relative rank errors with respect to data space and latent space. In cases where the labels are known we also use quality metrics of KL divergence and nearest neighbour classifications error in order to determine the separation between classes. We demonstrate the efficacy of these proposed models both for synthetic and real biological datasets with a main focus on the MHC class-I dataset.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Premium Intraocular Lenses (IOLs) such as toric IOLs, multifocal IOLs (MIOLs) and accommodating IOLs (AIOLs) can provide better refractive and visual outcomes compared to standard monofocal designs, leading to greater levels of post-operative spectacle independence. The principal theme of this thesis relates to the development of new assessment techniques that can help to improve future premium IOL design. IOLs designed to correct astigmatism form the focus of the first part of the thesis. A novel toric IOL design was devised to decrease the effect of toric rotation on patient visual acuity, but found to have neither a beneficial or detrimental impact on visual acuity retention. IOL tilt, like rotation, may curtail visual performance; however current IOL tilt measurement techniques require the use of specialist equipment not readily available in most ophthalmological clinics. Thus a new idea that applied Pythagoras’s theory to digital images of IOL optic symmetricality in order to calculate tilt was proposed, and shown to be both accurate and highly repeatable. A literature review revealed little information on the relationship between IOL tilt, decentration and rotation and so this was examined. A poor correlation between these factors was found, indicating they occur independently of each other. Next, presbyopia correcting IOLs were investigated. The light distribution of different MIOLs and an AIOL was assessed using perimetry, to establish whether this could be used to inform optimal IOL design. Anticipated differences in threshold sensitivity between IOLs were not however found, thus perimetry was concluded to be ineffective in mapping retinal projection of blur. The observed difference between subjective and objective measures of accommodation, arising from the influence of pseudoaccommodative factors, was explored next to establish how much additional objective power would be required to restore the eye’s focus with AIOLs. Blur tolerance was found to be the key contributor to the ocular depth of focus, with an approximate dioptric influence of 0.60D. Our understanding of MIOLs may be limited by the need for subjective defocus curves, which are lengthy and do not permit important additional measures to be undertaken. The use of aberrometry to provide faster objective defocus curves was examined. Although subjective and objective measures related well, the peaks of the MIOL defocus curve profile were not evident with objective prediction of acuity, indicating a need for further refinement of visual quality metrics based on ocular aberrations. The experiments detailed in the thesis evaluate methods to improve visual performance with toric IOLs. They also investigate new techniques to allow more rapid post-operative assessment of premium IOLs, which could allow greater insights to be obtained into several aspects of visual quality, in order to optimise future IOL design and ultimately enhance patient satisfaction.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Deforestation in the tropical Andes is affecting ecological conditions of streams, and determination of how much forest should be retained is a pressing task for conservation, restoration and management strategies. We calculated and analyzed eight benthic metrics (structural, compositional and water quality indices) and a physical-chemical composite index with gradients of vegetation cover to assess the effects of deforestation on macroinvertebrate communities and water quality of 23 streams in southern Ecuadorian Andes. Using a geographical information system (GIS), we quantified vegetation cover at three spatial scales: the entire catchment, the riparian buffer of 30 m width extending the entire stream length, and the local scale defined for a stream reach of 100 m in length and similar buffer width. Macroinvertebrate and water quality metrics had the strongest relationships with vegetation cover at catchment and riparian scales, while vegetation cover did not show any association with the macroinvertebrate metrics at local scale. At catchment scale, the water quality metrics indicate that ecological condition of Andean streams is good when vegetation cover is over 70%. Further, macroinvertebrate community assemblages were more diverse and related in catchments largely covered by native vegetation (>70%). Overall, our results suggest that retaining an important quantity of native vegetation cover within the catchments and a linkage between headwater and riparian forests help to maintain and improve stream biodiversity and water quality in Andean streams affected by deforestation. Also, this research proposes that a strong regulation focused to the management of riparian buffers can be successful when decision making is addressed to conservation/restoration of Andean catchments.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Purpose: To determine whether the ‘through-focus’ aberrations of a multifocal and accommodative intraocular lens (IOL) implanted patient can be used to provide rapid and reliable measures of their subjective range of clear vision. Methods: Eyes that had been implanted with a concentric (n = 8), segmented (n = 10) or accommodating (n = 6) intraocular lenses (mean age 62.9 ± 8.9 years; range 46-79 years) for over a year underwent simultaneous monocular subjective (electronic logMAR test chart at 4m with letters randomised between presentations) and objective (Aston open-field aberrometer) defocus curve testing for levels of defocus between +1.50 to -5.00DS in -0.50DS steps, in a randomised order. Pupil size and ocular aberration (a combination of the patient’s and the defocus inducing lens aberrations) at each level of blur was measured by the aberrometer. Visual acuity was measured subjectively at each level of defocus to determine the traditional defocus curve. Objective acuity was predicted using image quality metrics. Results: The range of clear focus differed between the three IOL types (F=15.506, P=0.001) as well as between subjective and objective defocus curves (F=6.685, p=0.049). There was no statistically significant difference between subjective and objective defocus curves in the segmented or concentric ring MIOL group (P>0.05). However a difference was found between the two measures and the accommodating IOL group (P<0.001). Mean Delta logMAR (predicted minus measured logMAR) across all target vergences was -0.06 ± 0.19 logMAR. Predicted logMAR defocus curves for the multifocal IOLs did not show a near vision addition peak, unlike the subjective measurement of visual acuity. However, there was a strong positive correlation between measured and predicted logMAR for all three IOLs (Pearson’s correlation: P<0.001). Conclusions: Current subjective procedures are lengthy and do not enable important additional measures such as defocus curves under differently luminance or contrast levels to be assessed, which may limit our understanding of MIOL performance in real-world conditions. In general objective aberrometry measures correlated well with the subjective assessment indicating the relative robustness of this technique in evaluating post-operative success with segmented and concentric ring MIOL.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Osteoporosis is one of the major causes of mortality among the elderly. Nowadays, areal bone mineral density (aBMD) is used as diagnostic criteria for osteoporosis; however, this is a moderate predictor of the femur fracture risk and does not capture the effect of some anatomical and physiological properties on the bone strength estimation. Data from past research suggest that most fragility femur fractures occur in patients with aBMD values outside the pathological range. Subject-specific finite element models derived from computed tomography data are considered better tools to non-invasively assess hip fracture risk. In particular, the Bologna Biomechanical Computed Tomography (BBCT) is an In Silico methodology that uses a subject specific FE model to predict bone strength. Different studies demonstrated that the modeling pipeline can increase predictive accuracy of osteoporosis detection and assess the efficacy of new antiresorptive drugs. However, one critical aspect that must be properly addressed before using the technology in the clinical practice, is the assessment of the model credibility. The aim of this study was to define and perform verification and uncertainty quantification analyses on the BBCT methodology following the risk-based credibility assessment framework recently proposed in the VV-40 standard. The analyses focused on the main verification tests used in computational solid mechanics: force and moment equilibrium check, mesh convergence analyses, mesh quality metrics study, evaluation of the uncertainties associated to the definition of the boundary conditions and material properties mapping. Results of these analyses showed that the FE model is correctly implemented and solved. The operation that mostly affect the model results is the material properties mapping step. This work represents an important step that, together with the ongoing clinical validation activities, will contribute to demonstrate the credibility of the BBCT methodology.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This thesis has been written as a result of the Language Toolkit project, organised by the Department of Interpreting and Translation of Forlì in collaboration with the Chamber of Commerce of Romagna. The aim of the project is to facilitate the internationalisation of small and medium enterprises in Romagna by making available to them the skills acquired by the students of the Faculty of Specialized Translation, who in turn are given the opportunity to approach an authentic professional context. Specifically, this thesis is the outcome of the 300-hour internship envisaged by the project, 75 of which were carried out at Jopla S.r.l. SB. The task assigned to the student was the translation into French of the Jopla For You web app and the Jopla PRO mobile app. This thesis consists of five chapters. The first chapter provides a general description of the Language Toolkit project and it focuses on the concept of translation into a non-native language. The second chapter outlines the theoretical context in which translation is set. Subsequently, the focus shifts to the topics of text, discourse, genre and textual typology, alongside a reflection on the applicability of these notions to web texts, and an analysis of the source text following Nord's model. The fourth chapter is dedicated to a description of the resources used in the preparation and translation phases. The fifth chapter describes the macro and micro strategies employed to carry out the translation. Furthermore, a comparative analysis between the human translation and the one provided by Google Translator is delivered. This analysis involves two methods: the first one follows the linguistic norms of the target language, while the second one relies on the error categorisation of the MQM model. Finally, the performance of Google Translate is investigated through the comparison of the results obtained from the MQM evaluation conducted in this thesis with the results obtained by Martellini (2021) in her analysis.