785 resultados para learning analytics framework


Relevância:

30.00% 30.00%

Publicador:

Resumo:

The fast development of Information Communication Technologies (ICT) offers new opportunities to realize future smart cities. To understand, manage and forecast the city's behavior, it is necessary the analysis of different kinds of data from the most varied dataset acquisition systems. The aim of this research activity in the framework of Data Science and Complex Systems Physics is to provide stakeholders with new knowledge tools to improve the sustainability of mobility demand in future cities. Under this perspective, the governance of mobility demand generated by large tourist flows is becoming a vital issue for the quality of life in Italian cities' historical centers, which will worsen in the next future due to the continuous globalization process. Another critical theme is sustainable mobility, which aims to reduce private transportation means in the cities and improve multimodal mobility. We analyze the statistical properties of urban mobility of Venice, Rimini, and Bologna by using different datasets provided by companies and local authorities. We develop algorithms and tools for cartography extraction, trips reconstruction, multimodality classification, and mobility simulation. We show the existence of characteristic mobility paths and statistical properties depending on transport means and user's kinds. Finally, we use our results to model and simulate the overall behavior of the cars moving in the Emilia Romagna Region and the pedestrians moving in Venice with software able to replicate in silico the demand for mobility and its dynamic.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the last decade, manufacturing companies have been facing two significant challenges. First, digitalization imposes adopting Industry 4.0 technologies and allows creating smart, connected, self-aware, and self-predictive factories. Second, the attention on sustainability imposes to evaluate and reduce the impact of the implemented solutions from economic and social points of view. In manufacturing companies, the maintenance of physical assets assumes a critical role. Increasing the reliability and the availability of production systems leads to the minimization of systems’ downtimes; In addition, the proper system functioning avoids production wastes and potentially catastrophic accidents. Digitalization and new ICT technologies have assumed a relevant role in maintenance strategies. They allow assessing the health condition of machinery at any point in time. Moreover, they allow predicting the future behavior of machinery so that maintenance interventions can be planned, and the useful life of components can be exploited until the time instant before their fault. This dissertation provides insights on Predictive Maintenance goals and tools in Industry 4.0 and proposes a novel data acquisition, processing, sharing, and storage framework that addresses typical issues machine producers and users encounter. The research elaborates on two research questions that narrow down the potential approaches to data acquisition, processing, and analysis for fault diagnostics in evolving environments. The research activity is developed according to a research framework, where the research questions are addressed by research levers that are explored according to research topics. Each topic requires a specific set of methods and approaches; however, the overarching methodological approach presented in this dissertation includes three fundamental aspects: the maximization of the quality level of input data, the use of Machine Learning methods for data analysis, and the use of case studies deriving from both controlled environments (laboratory) and real-world instances.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Biology is now a “Big Data Science” thanks to technological advancements allowing the characterization of the whole macromolecular content of a cell or a collection of cells. This opens interesting perspectives, but only a small portion of this data may be experimentally characterized. From this derives the demand of accurate and efficient computational tools for automatic annotation of biological molecules. This is even more true when dealing with membrane proteins, on which my research project is focused leading to the development of two machine learning-based methods: BetAware-Deep and SVMyr. BetAware-Deep is a tool for the detection and topology prediction of transmembrane beta-barrel proteins found in Gram-negative bacteria. These proteins are involved in many biological processes and primary candidates as drug targets. BetAware-Deep exploits the combination of a deep learning framework (bidirectional long short-term memory) and a probabilistic graphical model (grammatical-restrained hidden conditional random field). Moreover, it introduced a modified formulation of the hydrophobic moment, designed to include the evolutionary information. BetAware-Deep outperformed all the available methods in topology prediction and reported high scores in the detection task. Glycine myristoylation in Eukaryotes is the binding of a myristic acid on an N-terminal glycine. SVMyr is a fast method based on support vector machines designed to predict this modification in dataset of proteomic scale. It uses as input octapeptides and exploits computational scores derived from experimental examples and mean physicochemical features. SVMyr outperformed all the available methods for co-translational myristoylation prediction. In addition, it allows (as a unique feature) the prediction of post-translational myristoylation. Both the tools here described are designed having in mind best practices for the development of machine learning-based tools outlined by the bioinformatics community. Moreover, they are made available via user-friendly web servers. All this make them valuable tools for filling the gap between sequential and annotated data.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The idea behind the project is to develop a methodology for analyzing and developing techniques for the diagnosis and the prediction of the state of charge and health of lithium-ion batteries for automotive applications. For lithium-ion batteries, residual functionality is measured in terms of state of health; however, this value cannot be directly associated with a measurable value, so it must be estimated. The development of the algorithms is based on the identification of the causes of battery degradation, in order to model and predict the trend. Therefore, models have been developed that are able to predict the electrical, thermal and aging behavior. In addition to the model, it was necessary to develop algorithms capable of monitoring the state of the battery, online and offline. This was possible with the use of algorithms based on Kalman filters, which allow the estimation of the system status in real time. Through machine learning algorithms, which allow offline analysis of battery deterioration using a statistical approach, it is possible to analyze information from the entire fleet of vehicles. Both systems work in synergy in order to achieve the best performance. Validation was performed with laboratory tests on different batteries and under different conditions. The development of the model allowed to reduce the time of the experimental tests. Some specific phenomena were tested in the laboratory, and the other cases were artificially generated.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this thesis we discuss in what ways computational logic (CL) and data science (DS) can jointly contribute to the management of knowledge within the scope of modern and future artificial intelligence (AI), and how technically-sound software technologies can be realised along the path. An agent-oriented mindset permeates the whole discussion, by stressing pivotal role of autonomous agents in exploiting both means to reach higher degrees of intelligence. Accordingly, the goals of this thesis are manifold. First, we elicit the analogies and differences among CL and DS, hence looking for possible synergies and complementarities along 4 major knowledge-related dimensions, namely representation, acquisition (a.k.a. learning), inference (a.k.a. reasoning), and explanation. In this regard, we propose a conceptual framework through which bridges these disciplines can be described and designed. We then survey the current state of the art of AI technologies, w.r.t. their capability to support bridging CL and DS in practice. After detecting lacks and opportunities, we propose the notion of logic ecosystem as the new conceptual, architectural, and technological solution supporting the incremental integration of symbolic and sub-symbolic AI. Finally, we discuss how our notion of logic ecosys- tem can be reified into actual software technology and extended towards many DS-related directions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The importance of networks, in their broad sense, is rapidly and massively growing in modern-day society thanks to unprecedented communication capabilities offered by technology. In this context, the radio spectrum will be a primary resource to be preserved and not wasted. Therefore, the need for intelligent and automatic systems for in-depth spectrum analysis and monitoring will pave the way for a new set of opportunities and potential challenges. This thesis proposes a novel framework for automatic spectrum patrolling and the extraction of wireless network analytics. It aims to enhance the physical layer security of next generation wireless networks through the extraction and the analysis of dedicated analytical features. The framework consists of a spectrum sensing phase, carried out by a patrol composed of numerous radio-frequency (RF) sensing devices, followed by the extraction of a set of wireless network analytics. The methodology developed is blind, allowing spectrum sensing and analytics extraction of a network whose key features (i.e., number of nodes, physical layer signals, medium access protocol (MAC) and routing protocols) are unknown. Because of the wireless medium, over-the-air signals captured by the sensors are mixed; therefore, blind source separation (BSS) and measurement association are used to estimate the number of sources and separate the traffic patterns. After the separation, we put together a set of methodologies for extracting useful features of the wireless network, i.e., its logical topology, the application-level traffic patterns generated by the nodes, and their position. The whole framework is validated on an ad-hoc wireless network accounting for MAC protocol, packet collisions, nodes mobility, the spatial density of sensors, and channel impairments, such as path-loss, shadowing, and noise. The numerical results obtained by extensive and exhaustive simulations show that the proposed framework is consistent and can achieve the required performance.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the framework of industrial problems, the application of Constrained Optimization is known to have overall very good modeling capability and performance and stands as one of the most powerful, explored, and exploited tool to address prescriptive tasks. The number of applications is huge, ranging from logistics to transportation, packing, production, telecommunication, scheduling, and much more. The main reason behind this success is to be found in the remarkable effort put in the last decades by the OR community to develop realistic models and devise exact or approximate methods to solve the largest variety of constrained or combinatorial optimization problems, together with the spread of computational power and easily accessible OR software and resources. On the other hand, the technological advancements lead to a data wealth never seen before and increasingly push towards methods able to extract useful knowledge from them; among the data-driven methods, Machine Learning techniques appear to be one of the most promising, thanks to its successes in domains like Image Recognition, Natural Language Processes and playing games, but also the amount of research involved. The purpose of the present research is to study how Machine Learning and Constrained Optimization can be used together to achieve systems able to leverage the strengths of both methods: this would open the way to exploiting decades of research on resolution techniques for COPs and constructing models able to adapt and learn from available data. In the first part of this work, we survey the existing techniques and classify them according to the type, method, or scope of the integration; subsequently, we introduce a novel and general algorithm devised to inject knowledge into learning models through constraints, Moving Target. In the last part of the thesis, two applications stemming from real-world projects and done in collaboration with Optit will be presented.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With the CERN LHC program underway, there has been an acceleration of data growth in the High Energy Physics (HEP) field and the usage of Machine Learning (ML) in HEP will be critical during the HL-LHC program when the data that will be produced will reach the exascale. ML techniques have been successfully used in many areas of HEP nevertheless, the development of a ML project and its implementation for production use is a highly time-consuming task and requires specific skills. Complicating this scenario is the fact that HEP data is stored in ROOT data format, which is mostly unknown outside of the HEP community. The work presented in this thesis is focused on the development of a ML as a Service (MLaaS) solution for HEP, aiming to provide a cloud service that allows HEP users to run ML pipelines via HTTP calls. These pipelines are executed by using the MLaaS4HEP framework, which allows reading data, processing data, and training ML models directly using ROOT files of arbitrary size from local or distributed data sources. Such a solution provides HEP users non-expert in ML with a tool that allows them to apply ML techniques in their analyses in a streamlined manner. Over the years the MLaaS4HEP framework has been developed, validated, and tested and new features have been added. A first MLaaS solution has been developed by automatizing the deployment of a platform equipped with the MLaaS4HEP framework. Then, a service with APIs has been developed, so that a user after being authenticated and authorized can submit MLaaS4HEP workflows producing trained ML models ready for the inference phase. A working prototype of this service is currently running on a virtual machine of INFN-Cloud and is compliant to be added to the INFN Cloud portfolio of services.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Reinforcement Learning (RL) provides a powerful framework to address sequential decision-making problems in which the transition dynamics is unknown or too complex to be represented. The RL approach is based on speculating what is the best decision to make given sample estimates obtained from previous interactions, a recipe that led to several breakthroughs in various domains, ranging from game playing to robotics. Despite their success, current RL methods hardly generalize from one task to another, and achieving the kind of generalization obtained through unsupervised pre-training in non-sequential problems seems unthinkable. Unsupervised RL has recently emerged as a way to improve generalization of RL methods. Just as its non-sequential counterpart, the unsupervised RL framework comprises two phases: An unsupervised pre-training phase, in which the agent interacts with the environment without external feedback, and a supervised fine-tuning phase, in which the agent aims to efficiently solve a task in the same environment by exploiting the knowledge acquired during pre-training. In this thesis, we study unsupervised RL via state entropy maximization, in which the agent makes use of the unsupervised interactions to pre-train a policy that maximizes the entropy of its induced state distribution. First, we provide a theoretical characterization of the learning problem by considering a convex RL formulation that subsumes state entropy maximization. Our analysis shows that maximizing the state entropy in finite trials is inherently harder than RL. Then, we study the state entropy maximization problem from an optimization perspective. Especially, we show that the primal formulation of the corresponding optimization problem can be (approximately) addressed through tractable linear programs. Finally, we provide the first practical methodologies for state entropy maximization in complex domains, both when the pre-training takes place in a single environment as well as multiple environments.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The recent widespread use of social media platforms and web services has led to a vast amount of behavioral data that can be used to model socio-technical systems. A significant part of this data can be represented as graphs or networks, which have become the prevalent mathematical framework for studying the structure and the dynamics of complex interacting systems. However, analyzing and understanding these data presents new challenges due to their increasing complexity and diversity. For instance, the characterization of real-world networks includes the need of accounting for their temporal dimension, together with incorporating higher-order interactions beyond the traditional pairwise formalism. The ongoing growth of AI has led to the integration of traditional graph mining techniques with representation learning and low-dimensional embeddings of networks to address current challenges. These methods capture the underlying similarities and geometry of graph-shaped data, generating latent representations that enable the resolution of various tasks, such as link prediction, node classification, and graph clustering. As these techniques gain popularity, there is even a growing concern about their responsible use. In particular, there has been an increased emphasis on addressing the limitations of interpretability in graph representation learning. This thesis contributes to the advancement of knowledge in the field of graph representation learning and has potential applications in a wide range of complex systems domains. We initially focus on forecasting problems related to face-to-face contact networks with time-varying graph embeddings. Then, we study hyperedge prediction and reconstruction with simplicial complex embeddings. Finally, we analyze the problem of interpreting latent dimensions in node embeddings for graphs. The proposed models are extensively evaluated in multiple experimental settings and the results demonstrate their effectiveness and reliability, achieving state-of-the-art performances and providing valuable insights into the properties of the learned representations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The rapid progression of biomedical research coupled with the explosion of scientific literature has generated an exigent need for efficient and reliable systems of knowledge extraction. This dissertation contends with this challenge through a concentrated investigation of digital health, Artificial Intelligence, and specifically Machine Learning and Natural Language Processing's (NLP) potential to expedite systematic literature reviews and refine the knowledge extraction process. The surge of COVID-19 complicated the efforts of scientists, policymakers, and medical professionals in identifying pertinent articles and assessing their scientific validity. This thesis presents a substantial solution in the form of the COKE Project, an initiative that interlaces machine reading with the rigorous protocols of Evidence-Based Medicine to streamline knowledge extraction. In the framework of the COKE (“COVID-19 Knowledge Extraction framework for next-generation discovery science”) Project, this thesis aims to underscore the capacity of machine reading to create knowledge graphs from scientific texts. The project is remarkable for its innovative use of NLP techniques such as a BERT + bi-LSTM language model. This combination is employed to detect and categorize elements within medical abstracts, thereby enhancing the systematic literature review process. The COKE project's outcomes show that NLP, when used in a judiciously structured manner, can significantly reduce the time and effort required to produce medical guidelines. These findings are particularly salient during times of medical emergency, like the COVID-19 pandemic, when quick and accurate research results are critical.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Day by day, machine learning is changing our lives in ways we could not have imagined just 5 years ago. ML expertise is more and more requested and needed, though just a limited number of ML engineers are available on the job market, and their knowledge is always limited by an inherent characteristic of theirs: they are humans. This thesis explores the possibilities offered by meta-learning, a new field in ML that takes learning a level higher: models are trained on other models' training data, starting from features of the dataset they were trained on, inference times, obtained performances, to try to understand the relationship between a good model and the way it was obtained. The so-called metamodel was trained on data collected by OpenML, the largest ML metadata platform that's publicly available today. Datasets were analyzed to obtain meta-features that describe them, which were then tied to model performances in a regression task. The obtained metamodel predicts the expected performances of a given model type (e.g., a random forest) on a given ML task (e.g., classification on the UCI census dataset). This research was then integrated into a custom-made AutoML framework, to show how meta-learning is not an end in itself, but it can be used to further progress our ML research. Encoding ML engineering expertise in a model allows better, faster, and more impactful ML applications across the whole world, while reducing the cost that is inevitably tied to human engineers.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Unmanned Aerial Vehicle (UAVs) equipped with cameras have been fast deployed to a wide range of applications, such as smart cities, agriculture or search and rescue applications. Even though UAV datasets exist, the amount of open and quality UAV datasets is limited. So far, we want to overcome this lack of high quality annotation data by developing a simulation framework for a parametric generation of synthetic data. The framework accepts input via a serializable format. The input specifies which environment preset is used, the objects to be placed in the environment along with their position and orientation as well as additional information such as object color and size. The result is an environment that is able to produce UAV typical data: RGB image from the UAVs camera, altitude, roll, pitch and yawn of the UAV. Beyond the image generation process, we improve the resulting image data photorealism by using Synthetic-To-Real transfer learning methods. Transfer learning focuses on storing knowledge gained while solving one problem and applying it to a different - although related - problem. This approach has been widely researched in other affine fields and results demonstrate it to be an interesing area to investigate. Since simulated images are easy to create and synthetic-to-real translation has shown good quality results, we are able to generate pseudo-realistic images. Furthermore, object labels are inherently given, so we are capable of extending the already existing UAV datasets with realistic quality images and high resolution meta-data. During the development of this thesis we have been able to produce a result of 68.4% on UAVid. This can be considered a new state-of-art result on this dataset.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A global italian pharmaceutical company has to provide two work environments that favor different needs. The environments will allow to develop solutions in a controlled, secure and at the same time in an independent manner on a state-of-the-art enterprise cloud platform. The need of developing two different environments is dictated by the needs of the working units. Indeed, the first environment is designed to facilitate the creation of application related to genomics, therefore, designed more for data-scientists. This environment is capable of consuming, producing, retrieving and incorporating data, furthermore, will support the most used programming languages for genomic applications (e.g., Python, R). The proposal was to obtain a pool of ready-togo Virtual Machines with different architectures to provide best performance based on the job that needs to be carried out. The second environment has more of a traditional trait, to obtain, via ETL (Extract-Transform-Load) process, a global datamodel, resembling a classical relational structure. It will provide major BI operations (e.g., analytics, performance measure, reports, etc.) that can be leveraged both for application analysis or for internal usage. Since, both architectures will maintain large amounts of data regarding not only pharmaceutical informations but also internal company informations, it would be possible to digest the data by reporting/ analytics tools and also apply data-mining, machine learning technologies to exploit intrinsic informations. The thesis work will introduce, proposals, implementations, descriptions of used technologies/platforms and future works of the above discussed environments.

Relevância:

20.00% 20.00%

Publicador:

Resumo:

Resource specialisation, although a fundamental component of ecological theory, is employed in disparate ways. Most definitions derive from simple counts of resource species. We build on recent advances in ecophylogenetics and null model analysis to propose a concept of specialisation that comprises affinities among resources as well as their co-occurrence with consumers. In the distance-based specialisation index (DSI), specialisation is measured as relatedness (phylogenetic or otherwise) of resources, scaled by the null expectation of random use of locally available resources. Thus, specialists use significantly clustered sets of resources, whereas generalists use over-dispersed resources. Intermediate species are classed as indiscriminate consumers. The effectiveness of this approach was assessed with differentially restricted null models, applied to a data set of 168 herbivorous insect species and their hosts. Incorporation of plant relatedness and relative abundance greatly improved specialisation measures compared to taxon counts or simpler null models, which overestimate the fraction of specialists, a problem compounded by insufficient sampling effort. This framework disambiguates the concept of specialisation with an explicit measure applicable to any mode of affinity among resource classes, and is also linked to ecological and evolutionary processes. This will enable a more rigorous deployment of ecological specialisation in empirical and theoretical studies.