901 resultados para Learning techniques
Resumo:
The rapid growth of virtualized data centers and cloud hosting services is making the management of physical resources such as CPU, memory, and I/O bandwidth in data center servers increasingly important. Server management now involves dealing with multiple dissimilar applications with varying Service-Level-Agreements (SLAs) and multiple resource dimensions. The multiplicity and diversity of resources and applications are rendering administrative tasks more complex and challenging. This thesis aimed to develop a framework and techniques that would help substantially reduce data center management complexity. We specifically addressed two crucial data center operations. First, we precisely estimated capacity requirements of client virtual machines (VMs) while renting server space in cloud environment. Second, we proposed a systematic process to efficiently allocate physical resources to hosted VMs in a data center. To realize these dual objectives, accurately capturing the effects of resource allocations on application performance is vital. The benefits of accurate application performance modeling are multifold. Cloud users can size their VMs appropriately and pay only for the resources that they need; service providers can also offer a new charging model based on the VMs performance instead of their configured sizes. As a result, clients will pay exactly for the performance they are actually experiencing; on the other hand, administrators will be able to maximize their total revenue by utilizing application performance models and SLAs. This thesis made the following contributions. First, we identified resource control parameters crucial for distributing physical resources and characterizing contention for virtualized applications in a shared hosting environment. Second, we explored several modeling techniques and confirmed the suitability of two machine learning tools, Artificial Neural Network and Support Vector Machine, to accurately model the performance of virtualized applications. Moreover, we suggested and evaluated modeling optimizations necessary to improve prediction accuracy when using these modeling tools. Third, we presented an approach to optimal VM sizing by employing the performance models we created. Finally, we proposed a revenue-driven resource allocation algorithm which maximizes the SLA-generated revenue for a data center.
Resumo:
Security defects are common in large software systems because of their size and complexity. Although efficient development processes, testing, and maintenance policies are applied to software systems, there are still a large number of vulnerabilities that can remain, despite these measures. Some vulnerabilities stay in a system from one release to the next one because they cannot be easily reproduced through testing. These vulnerabilities endanger the security of the systems. We propose vulnerability classification and prediction frameworks based on vulnerability reproducibility. The frameworks are effective to identify the types and locations of vulnerabilities in the earlier stage, and improve the security of software in the next versions (referred to as releases). We expand an existing concept of software bug classification to vulnerability classification (easily reproducible and hard to reproduce) to develop a classification framework for differentiating between these vulnerabilities based on code fixes and textual reports. We then investigate the potential correlations between the vulnerability categories and the classical software metrics and some other runtime environmental factors of reproducibility to develop a vulnerability prediction framework. The classification and prediction frameworks help developers adopt corresponding mitigation or elimination actions and develop appropriate test cases. Also, the vulnerability prediction framework is of great help for security experts focus their effort on the top-ranked vulnerability-prone files. As a result, the frameworks decrease the number of attacks that exploit security vulnerabilities in the next versions of the software. To build the classification and prediction frameworks, different machine learning techniques (C4.5 Decision Tree, Random Forest, Logistic Regression, and Naive Bayes) are employed. The effectiveness of the proposed frameworks is assessed based on collected software security defects of Mozilla Firefox.
Resumo:
A evolução tecnológica tem provocado uma evolução na medicina, através de sistemas computacionais voltados para o armazenamento, captura e disponibilização de informações médicas. Os relatórios médicos são, na maior parte das vezes, guardados num texto livre não estruturado e escritos com vocabulário proprietário, podendo ocasionar falhas de interpretação. Através das linguagens da Web Semântica, é possível utilizar antologias como modo de estruturar e padronizar a informação dos relatórios médicos, adicionando¬ lhe anotações semânticas. A informação contida nos relatórios pode desta forma ser publicada na Web, permitindo às máquinas o processamento automático da informação. No entanto, o processo de criação de antologias é bastante complexo, pois existe o problema de criar uma ontologia que não cubra todo o domínio pretendido. Este trabalho incide na criação de uma ontologia e respectiva povoação, através de técnicas de PLN e Aprendizagem Automática que permitem extrair a informação dos relatórios médicos. Foi desenvolvida uma aplicação, que permite ao utilizador converter relatórios do formato digital para o formato OWL. ABSTRACT: Technological evolution has caused a medicine evolution through computer systems which allow storage, gathering and availability of medical information. Medical reports are, most of the times, stored in a non-structured free text and written in a personal way so that misunderstandings may occur. Through Semantic Web languages, it’s possible to use ontology as a way to structure and standardize medical reports information by adding semantic notes. The information in those reports can, by these means, be displayed on the web, allowing machines automatic information processing. However, the process of creating ontology is very complex, as there is a risk creating of an ontology that not covering the whole desired domain. This work is about creation of an ontology and its population through NLP and Machine Learning techniques to extract information from medical reports. An application was developed which allows the user to convert reports from digital for¬ mat to OWL format.
Resumo:
This paper presents our work at 2016 FIRE CHIS. Given a CHIS query and a document associated with that query, the task is to classify the sentences in the document as relevant to the query or not; and further classify the relevant sentences to be supporting, neutral or opposing to the claim made in the query. In this paper, we present two different approaches to do the classification. With the first approach, we implement two models to satisfy the task. We first implement an information retrieval model to retrieve the sentences that are relevant to the query; and then we use supervised learning method to train a classification model to classify the relevant sentences into support, oppose or neutral. With the second approach, we only use machine learning techniques to learn a model and classify the sentences into four classes (relevant & support, relevant & neutral, relevant & oppose, irrelevant & neutral). Our submission for CHIS uses the first approach.
Resumo:
Radars are expected to become the main sensors in various civilian applications, especially for autonomous driving. Their success is mainly due to the availability of low cost integrated devices, equipped with compact antenna arrays, and computationally efficient signal processing techniques. This thesis focuses on the study and the development of different deterministic and learning based techniques for colocated multiple-input multiple-output (MIMO) radars. In particular, after providing an overview on the architecture of these devices, the problem of detecting and estimating multiple targets in stepped frequency continuous wave (SFCW) MIMO radar systems is investigated and different deterministic techniques solving it are illustrated. Moreover, novel solutions, based on an approximate maximum likelihood approach, are developed. The accuracy achieved by all the considered algorithms is assessed on the basis of the raw data acquired from low power wideband radar devices. The results demonstrate that the developed algorithms achieve reasonable accuracies, but at the price of different computational efforts. Another important technical problem investigated in this thesis concerns the exploitation of machine learning and deep learning techniques in the field of colocated MIMO radars. In this thesis, after providing a comprehensive overview of the machine learning and deep learning techniques currently being considered for use in MIMO radar systems, their performance in two different applications is assessed on the basis of synthetically generated and experimental datasets acquired through a commercial frequency modulated continuous wave (FMCW) MIMO radar. Finally, the application of colocated MIMO radars to autonomous driving in smart agriculture is illustrated.
Resumo:
The fourth industrial revolution is paving the way for Industrial Internet of Things applications where industrial assets (e.g., robotic arms, valves, pistons) are equipped with a large number of wireless devices (i.e., microcontroller boards that embed sensors and actuators) to enable a plethora of new applications, such as analytics, diagnostics, monitoring, as well as supervisory, and safety control use-cases. Nevertheless, current wireless technologies, such as Wi-Fi, Bluetooth, and even private 5G networks, cannot fulfill all the requirements set up by the Industry 4.0 paradigm, thus opening up new 6G-oriented research trends, such as the use of THz frequencies. In light of the above, this thesis provides (i) a broad overview of the main use-cases, requirements, and key enabling wireless technologies foreseen by the fourth industrial revolution, and (ii) proposes innovative contributions, both theoretical and empirical, to enhance the performance of current and future wireless technologies at different levels of the protocol stack. In particular, at the physical layer, signal processing techniques are being exploited to analyze two multiplexing schemes, namely Affine Frequency Division Multiplexing and Orthogonal Chirp Division Multiplexing, which seem promising for high-frequency wireless communications. At the medium access layer, three protocols for intra-machine communications are proposed, where one is based on LoRa at 2.4 GHz and the others work in the THz band. Different scheduling algorithms for private industrial 5G networks are compared, and two main proposals are described, i.e., a decentralized scheme that leverages machine learning techniques to better address aperiodic traffic patterns, and a centralized contention-based design that serves a federated learning industrial application. Results are provided in terms of numerical evaluations, simulation results, and real-world experiments. Several improvements over the state-of-the-art were obtained, and the description of up-and-running testbeds demonstrates the feasibility of some of the theoretical concepts when considering a real industry plant.
Resumo:
The discovery of new materials and their functions has always been a fundamental component of technological progress. Nowadays, the quest for new materials is stronger than ever: sustainability, medicine, robotics and electronics are all key assets which depend on the ability to create specifically tailored materials. However, designing materials with desired properties is a difficult task, and the complexity of the discipline makes it difficult to identify general criteria. While scientists developed a set of best practices (often based on experience and expertise), this is still a trial-and-error process. This becomes even more complex when dealing with advanced functional materials. Their properties depend on structural and morphological features, which in turn depend on fabrication procedures and environment, and subtle alterations leads to dramatically different results. Because of this, materials modeling and design is one of the most prolific research fields. Many techniques and instruments are continuously developed to enable new possibilities, both in the experimental and computational realms. Scientists strive to enforce cutting-edge technologies in order to make progress. However, the field is strongly affected by unorganized file management, proliferation of custom data formats and storage procedures, both in experimental and computational research. Results are difficult to find, interpret and re-use, and a huge amount of time is spent interpreting and re-organizing data. This also strongly limit the application of data-driven and machine learning techniques. This work introduces possible solutions to the problems described above. Specifically, it talks about developing features for specific classes of advanced materials and use them to train machine learning models and accelerate computational predictions for molecular compounds; developing method for organizing non homogeneous materials data; automate the process of using devices simulations to train machine learning models; dealing with scattered experimental data and use them to discover new patterns.
Resumo:
Hematological cancers are a heterogeneous family of diseases that can be divided into leukemias, lymphomas, and myelomas, often called “liquid tumors”. Since they cannot be surgically removable, chemotherapy represents the mainstay of their treatment. However, it still faces several challenges like drug resistance and low response rate, and the need for new anticancer agents is compelling. The drug discovery process is long-term, costly, and prone to high failure rates. With the rapid expansion of biological and chemical "big data", some computational techniques such as machine learning tools have been increasingly employed to speed up and economize the whole process. Machine learning algorithms can create complex models with the aim to determine the biological activity of compounds against several targets, based on their chemical properties. These models are defined as multi-target Quantitative Structure-Activity Relationship (mt-QSAR) and can be used to virtually screen small and large chemical libraries for the identification of new molecules with anticancer activity. The aim of my Ph.D. project was to employ machine learning techniques to build an mt-QSAR classification model for the prediction of cytotoxic drugs simultaneously active against 43 hematological cancer cell lines. For this purpose, first, I constructed a large and diversified dataset of molecules extracted from the ChEMBL database. Then, I compared the performance of different ML classification algorithms, until Random Forest was identified as the one returning the best predictions. Finally, I used different approaches to maximize the performance of the model, which achieved an accuracy of 88% by correctly classifying 93% of inactive molecules and 72% of active molecules in a validation set. This model was further applied to the virtual screening of a small dataset of molecules tested in our laboratory, where it showed 100% accuracy in correctly classifying all molecules. This result is confirmed by our previous in vitro experiments.
Resumo:
In this thesis we discuss the expansion of an existing project, called CHIMeRA, which is a comprehensive biomedical network, and the analysis of its sub-components by using graph theory. We describe how it is structured internally, what are the existing databases from which it retrieves information and what machine learning techniques are used in order to produce new knowledge. We also introduce a new technique for graph exploration that is aimed to speed-up the network cover time under the condition that the analyzed graph is stellar; if this condition is satisfied, the improvement in the performance compared to the conventional exploration technique is extremely appealing. We show that the stellar structure is highly recurrent for sub-networks in CHIMeRA generated by queries, which made this technique even more interesting. Finally, we describe the convenience in using the CHIMeRA network for research purposes and what it could become in a very near future.
Resumo:
Description of the development of a product able to deliver an autonomous page construction from a predefined plan. The processes involve Machine Learning techniques for text fitting on shapes, Beam Search for associations and Deep Learning for autonomous cropping of images.
Resumo:
Privacy issues and data scarcity in PET field call for efficient methods to expand datasets via synthetic generation of new data that cannot be traced back to real patients and that are also realistic. In this thesis, machine learning techniques were applied to 1001 amyloid-beta PET images, which had undergone a diagnosis of Alzheimer’s disease: the evaluations were 540 positive, 457 negative and 4 unknown. Isomap algorithm was used as a manifold learning method to reduce the dimensions of the PET dataset; a numerical scale-free interpolation method was applied to invert the dimensionality reduction map. The interpolant was tested on the PET images via LOOCV, where the removed images were compared with the reconstructed ones with the mean SSIM index (MSSIM = 0.76 ± 0.06). The effectiveness of this measure is questioned, since it indicated slightly higher performance for a method of comparison using PCA (MSSIM = 0.79 ± 0.06), which gave clearly poor quality reconstructed images with respect to those recovered by the numerical inverse mapping. Ten synthetic PET images were generated and, after having been mixed with ten originals, were sent to a team of clinicians for the visual assessment of their realism; no significant agreements were found either between clinicians and the true image labels or among the clinicians, meaning that original and synthetic images were indistinguishable. The future perspective of this thesis points to the improvement of the amyloid-beta PET research field by increasing available data, overcoming the constraints of data acquisition and privacy issues. Potential improvements can be achieved via refinements of the manifold learning and the inverse mapping stages during the PET image analysis, by exploring different combinations in the choice of algorithm parameters and by applying other non-linear dimensionality reduction algorithms. A final prospect of this work is the search for new methods to assess image reconstruction quality.
Resumo:
Recent studies of mobile Web trends show the continued explosion of mobile-friend content. However, the wide number and heterogeneity of mobile devices poses several challenges for Web programmers, who want automatic delivery of context and adaptation of the content to mobile devices. Hence, the device detection phase assumes an important role in this process. In this chapter, the authors compare the most used approaches for mobile device detection. Based on this study, they present an architecture for detecting and delivering uniform m-Learning content to students in a Higher School. The authors focus mainly on the XML device capabilities repository and on the REST API Web Service for dealing with device data. In the former, the authors detail the respective capabilities schema and present a new caching approach. In the latter, they present an extension of the current API for dealing with it. Finally, the authors validate their approach by presenting the overall data and statistics collected through the Google Analytics service, in order to better understand the adherence to the mobile Web interface, its evolution over time, and the main weaknesses.
Analysis and evaluation of techniques for the extraction of classes in the ontology learning process
Resumo:
This paper analyzes and evaluates, in the context of Ontology learning, some techniques to identify and extract candidate terms to classes of a taxonomy. Besides, this work points out some inconsistencies that may be occurring in the preprocessing of text corpus, and proposes techniques to obtain good terms candidate to classes of a taxonomy.
Resumo:
This case study explored strategies and techniques in order to assist individuals with learning disabilities in their academic achievement. Of particular focus was how a literacy-based program, titled The Spring Reading Program, utilizes effective tactics and approaches that result in academic growth. The Spring Reading Program, offered by the Learning Disabilities Association of Niagara Region (LDANR) and partnered with John McNamara from Brock University, supports children with reading disabilities academically. In addition, the program helps children increase their confidence and motivation towards literacy. I began this study by outlining the importance of reading followed by and exploration of what educators and researchers have demonstrated regarding effective literacy instruction for children with learning disabilities. I studied effective strategies and techniques in the Spring Reading Program by conducting a qualitative case study of the program. This case study subsequently presents in depth, 4 specific strategies: Hands-on activities, motivation, engagement, and one-on-one instruction. Each strategy demonstrates its effectiveness through literature and examples from the Spring Reading Program.
Resumo:
The aim of this study is to show the importance of two classification techniques, viz. decision tree and clustering, in prediction of learning disabilities (LD) of school-age children. LDs affect about 10 percent of all children enrolled in schools. The problems of children with specific learning disabilities have been a cause of concern to parents and teachers for some time. Decision trees and clustering are powerful and popular tools used for classification and prediction in Data mining. Different rules extracted from the decision tree are used for prediction of learning disabilities. Clustering is the assignment of a set of observations into subsets, called clusters, which are useful in finding the different signs and symptoms (attributes) present in the LD affected child. In this paper, J48 algorithm is used for constructing the decision tree and K-means algorithm is used for creating the clusters. By applying these classification techniques, LD in any child can be identified