819 resultados para CMS, popularity, Machine Learning, DCAFPilot
Resumo:
In CMS è stato lanciato un progetto di Data Analytics e, all’interno di esso, un’attività specifica pilota che mira a sfruttare tecniche di Machine Learning per predire la popolarità dei dataset di CMS. Si tratta di un’osservabile molto delicata, la cui eventuale predizione premetterebbe a CMS di costruire modelli di data placement più intelligenti, ampie ottimizzazioni nell’uso dello storage a tutti i livelli Tiers, e formerebbe la base per l’introduzione di un solito sistema di data management dinamico e adattivo. Questa tesi descrive il lavoro fatto sfruttando un nuovo prototipo pilota chiamato DCAFPilot, interamente scritto in python, per affrontare questa sfida.
Resumo:
This thesis presents a study of the Grid data access patterns in distributed analysis in the CMS experiment at the LHC accelerator. This study ranges from the deep analysis of the historical patterns of access to the most relevant data types in CMS, to the exploitation of a supervised Machine Learning classification system to set-up a machinery able to eventually predict future data access patterns - i.e. the so-called dataset “popularity” of the CMS datasets on the Grid - with focus on specific data types. All the CMS workflows run on the Worldwide LHC Computing Grid (WCG) computing centers (Tiers), and in particular the distributed analysis systems sustains hundreds of users and applications submitted every day. These applications (or “jobs”) access different data types hosted on disk storage systems at a large set of WLCG Tiers. The detailed study of how this data is accessed, in terms of data types, hosting Tiers, and different time periods, allows to gain precious insight on storage occupancy over time and different access patterns, and ultimately to extract suggested actions based on this information (e.g. targetted disk clean-up and/or data replication). In this sense, the application of Machine Learning techniques allows to learn from past data and to gain predictability potential for the future CMS data access patterns. Chapter 1 provides an introduction to High Energy Physics at the LHC. Chapter 2 describes the CMS Computing Model, with special focus on the data management sector, also discussing the concept of dataset popularity. Chapter 3 describes the study of CMS data access patterns with different depth levels. Chapter 4 offers a brief introduction to basic machine learning concepts and gives an introduction to its application in CMS and discuss the results obtained by using this approach in the context of this thesis.
Resumo:
The scientific success of the LHC experiments at CERN highly depends on the availability of computing resources which efficiently store, process, and analyse the amount of data collected every year. This is ensured by the Worldwide LHC Computing Grid infrastructure that connect computing centres distributed all over the world with high performance network. LHC has an ambitious experimental program for the coming years, which includes large investments and improvements both for the hardware of the detectors and for the software and computing systems, in order to deal with the huge increase in the event rate expected from the High Luminosity LHC (HL-LHC) phase and consequently with the huge amount of data that will be produced. Since few years the role of Artificial Intelligence has become relevant in the High Energy Physics (HEP) world. Machine Learning (ML) and Deep Learning algorithms have been successfully used in many areas of HEP, like online and offline reconstruction programs, detector simulation, object reconstruction, identification, Monte Carlo generation, and surely they will be crucial in the HL-LHC phase. This thesis aims at contributing to a CMS R&D project, regarding a ML "as a Service" solution for HEP needs (MLaaS4HEP). It consists in a data-service able to perform an entire ML pipeline (in terms of reading data, processing data, training ML models, serving predictions) in a completely model-agnostic fashion, directly using ROOT files of arbitrary size from local or distributed data sources. This framework has been updated adding new features in the data preprocessing phase, allowing more flexibility to the user. Since the MLaaS4HEP framework is experiment agnostic, the ATLAS Higgs Boson ML challenge has been chosen as physics use case, with the aim to test MLaaS4HEP and the contribution done with this work.
Resumo:
PURPOSE: To evaluate the sensitivity and specificity of machine learning classifiers (MLCs) for glaucoma diagnosis using Spectral Domain OCT (SD-OCT) and standard automated perimetry (SAP). METHODS: Observational cross-sectional study. Sixty two glaucoma patients and 48 healthy individuals were included. All patients underwent a complete ophthalmologic examination, achromatic standard automated perimetry (SAP) and retinal nerve fiber layer (RNFL) imaging with SD-OCT (Cirrus HD-OCT; Carl Zeiss Meditec Inc., Dublin, California). Receiver operating characteristic (ROC) curves were obtained for all SD-OCT parameters and global indices of SAP. Subsequently, the following MLCs were tested using parameters from the SD-OCT and SAP: Bagging (BAG), Naive-Bayes (NB), Multilayer Perceptron (MLP), Radial Basis Function (RBF), Random Forest (RAN), Ensemble Selection (ENS), Classification Tree (CTREE), Ada Boost M1(ADA),Support Vector Machine Linear (SVML) and Support Vector Machine Gaussian (SVMG). Areas under the receiver operating characteristic curves (aROC) obtained for isolated SAP and OCT parameters were compared with MLCs using OCT+SAP data. RESULTS: Combining OCT and SAP data, MLCs' aROCs varied from 0.777(CTREE) to 0.946 (RAN).The best OCT+SAP aROC obtained with RAN (0.946) was significantly larger the best single OCT parameter (p<0.05), but was not significantly different from the aROC obtained with the best single SAP parameter (p=0.19). CONCLUSION: Machine learning classifiers trained on OCT and SAP data can successfully discriminate between healthy and glaucomatous eyes. The combination of OCT and SAP measurements improved the diagnostic accuracy compared with OCT data alone.
Resumo:
There is not a specific test to diagnose Alzheimer`s disease (AD). Its diagnosis should be based upon clinical history, neuropsychological and laboratory tests, neuroimaging and electroencephalography (EEG). Therefore, new approaches are necessary to enable earlier and more accurate diagnosis and to follow treatment results. In this study we used a Machine Learning (ML) technique, named Support Vector Machine (SVM), to search patterns in EEG epochs to differentiate AD patients from controls. As a result, we developed a quantitative EEG (qEEG) processing method for automatic differentiation of patients with AD from normal individuals, as a complement to the diagnosis of probable dementia. We studied EEGs from 19 normal subjects (14 females/5 males, mean age 71.6 years) and 16 probable mild to moderate symptoms AD patients (14 females/2 males, mean age 73.4 years. The results obtained from analysis of EEG epochs were accuracy 79.9% and sensitivity 83.2%. The analysis considering the diagnosis of each individual patient reached 87.0% accuracy and 91.7% sensitivity.
Resumo:
Introduction: A major focus of data mining process - especially machine learning researches - is to automatically learn to recognize complex patterns and help to take the adequate decisions strictly based on the acquired data. Since imaging techniques like MPI – Myocardial Perfusion Imaging on Nuclear Cardiology, can implicate a huge part of the daily workflow and generate gigabytes of data, there could be advantages on Computerized Analysis of data over Human Analysis: shorter time, homogeneity and consistency, automatic recording of analysis results, relatively inexpensive, etc.Objectives: The aim of this study relates with the evaluation of the efficacy of this methodology on the evaluation of MPI Stress studies and the process of decision taking concerning the continuation – or not – of the evaluation of each patient. It has been pursued has an objective to automatically classify a patient test in one of three groups: “Positive”, “Negative” and “Indeterminate”. “Positive” would directly follow to the Rest test part of the exam, the “Negative” would be directly exempted from continuation and only the “Indeterminate” group would deserve the clinician analysis, so allowing economy of clinician’s effort, increasing workflow fluidity at the technologist’s level and probably sparing time to patients. Methods: WEKA v3.6.2 open source software was used to make a comparative analysis of three WEKA algorithms (“OneR”, “J48” and “Naïve Bayes”) - on a retrospective study using the comparison with correspondent clinical results as reference, signed by nuclear cardiologist experts - on “SPECT Heart Dataset”, available on University of California – Irvine, at the Machine Learning Repository. For evaluation purposes, criteria as “Precision”, “Incorrectly Classified Instances” and “Receiver Operating Characteristics (ROC) Areas” were considered. Results: The interpretation of the data suggests that the Naïve Bayes algorithm has the best performance among the three previously selected algorithms. Conclusions: It is believed - and apparently supported by the findings - that machine learning algorithms could significantly assist, at an intermediary level, on the analysis of scintigraphic data obtained on MPI, namely after Stress acquisition, so eventually increasing efficiency of the entire system and potentially easing both roles of Technologists and Nuclear Cardiologists. In the actual continuation of this study, it is planned to use more patient information and significantly increase the population under study, in order to allow improving system accuracy.
Resumo:
Sendo uma forma natural de interação homem-máquina, o reconhecimento de gestos implica uma forte componente de investigação em áreas como a visão por computador e a aprendizagem computacional. O reconhecimento gestual é uma área com aplicações muito diversas, fornecendo aos utilizadores uma forma mais natural e mais simples de comunicar com sistemas baseados em computador, sem a necessidade de utilização de dispositivos extras. Assim, o objectivo principal da investigação na área de reconhecimento de gestos aplicada à interacção homemmáquina é o da criação de sistemas, que possam identificar gestos específicos e usálos para transmitir informações ou para controlar dispositivos. Para isso as interfaces baseados em visão para o reconhecimento de gestos, necessitam de detectar a mão de forma rápida e robusta e de serem capazes de efetuar o reconhecimento de gestos em tempo real. Hoje em dia, os sistemas de reconhecimento de gestos baseados em visão são capazes de trabalhar com soluções específicas, construídos para resolver um determinado problema e configurados para trabalhar de uma forma particular. Este projeto de investigação estudou e implementou soluções, suficientemente genéricas, com o recurso a algoritmos de aprendizagem computacional, permitindo a sua aplicação num conjunto alargado de sistemas de interface homem-máquina, para reconhecimento de gestos em tempo real. A solução proposta, Gesture Learning Module Architecture (GeLMA), permite de forma simples definir um conjunto de comandos que pode ser baseado em gestos estáticos e dinâmicos e que pode ser facilmente integrado e configurado para ser utilizado numa série de aplicações. É um sistema de baixo custo e fácil de treinar e usar, e uma vez que é construído unicamente com bibliotecas de código. As experiências realizadas permitiram mostrar que o sistema atingiu uma precisão de 99,2% em termos de reconhecimento de gestos estáticos e uma precisão média de 93,7% em termos de reconhecimento de gestos dinâmicos. Para validar a solução proposta, foram implementados dois sistemas completos. O primeiro é um sistema em tempo real capaz de ajudar um árbitro a arbitrar um jogo de futebol robótico. A solução proposta combina um sistema de reconhecimento de gestos baseada em visão com a definição de uma linguagem formal, o CommLang Referee, à qual demos a designação de Referee Command Language Interface System (ReCLIS). O sistema identifica os comandos baseados num conjunto de gestos estáticos e dinâmicos executados pelo árbitro, sendo este posteriormente enviado para um interface de computador que transmite a respectiva informação para os robôs. O segundo é um sistema em tempo real capaz de interpretar um subconjunto da Linguagem Gestual Portuguesa. As experiências demonstraram que o sistema foi capaz de reconhecer as vogais em tempo real de forma fiável. Embora a solução implementada apenas tenha sido treinada para reconhecer as cinco vogais, o sistema é facilmente extensível para reconhecer o resto do alfabeto. As experiências também permitiram mostrar que a base dos sistemas de interação baseados em visão pode ser a mesma para todas as aplicações e, deste modo facilitar a sua implementação. A solução proposta tem ainda a vantagem de ser suficientemente genérica e uma base sólida para o desenvolvimento de sistemas baseados em reconhecimento gestual que podem ser facilmente integrados com qualquer aplicação de interface homem-máquina. A linguagem formal de definição da interface pode ser redefinida e o sistema pode ser facilmente configurado e treinado com um conjunto de gestos diferentes de forma a serem integrados na solução final.
Resumo:
In this article, we calibrate the Vasicek interest rate model under the risk neutral measure by learning the model parameters using Gaussian processes for machine learning regression. The calibration is done by maximizing the likelihood of zero coupon bond log prices, using mean and covariance functions computed analytically, as well as likelihood derivatives with respect to the parameters. The maximization method used is the conjugate gradients. The only prices needed for calibration are zero coupon bond prices and the parameters are directly obtained in the arbitrage free risk neutral measure.
Resumo:
Dissertação para obtenção do Grau de Mestre em Engenharia Informática
Resumo:
Dissertação para obtenção do Grau de Doutor em Estatística e Gestão do Risco
Resumo:
"Lecture notes in computational vision and biomechanics series, ISSN 2212-9391, vol. 19"