856 resultados para Data Driven Clustering
Resumo:
Cette recherche explore comment l’infrastructure et les utilisations d’eBird, l’un des plus grands projets de science citoyenne dans le monde, se développent et évoluent dans le temps et l’espace. Nous nous concentrerons sur le travail d’eBird avec deux de ses partenaires latino-américains, le Mexique et le Pérou, chacun avec un portail Web géré par des organisations locales. eBird, qui est maintenant un grand réseau mondial de partenariats, donne occasion aux citoyens du monde entier la possibilité de contribuer à la science et à la conservation d’oiseaux à partir de ses observations téléchargées en ligne. Ces observations sont gérées et gardées dans une base de données qui est unifiée, globale et accessible pour tous ceux qui s’intéressent au sujet des oiseaux et sa conservation. De même, les utilisateurs profitent des fonctionnalités de la plateforme pour organiser et visualiser leurs données et celles d’autres. L’étude est basée sur une méthodologie qualitative à partir de l’observation des plateformes Web et des entrevues semi-structurées avec les membres du Laboratoire d’ornithologie de Cornell, l’équipe eBird et les membres des organisations partenaires locales responsables d’eBird Pérou et eBird Mexique. Nous analysons eBird comme une infrastructure qui prend en considération les aspects sociaux et techniques dans son ensemble, comme un tout. Nous explorons aussi à la variété de différents types d’utilisation de la plateforme et de ses données par ses divers utilisateurs. Trois grandes thématiques ressortent : l’importance de la collaboration comme une philosophie qui sous-tend le développement d’eBird, l’élargissement des relations et connexions d’eBird à travers ses partenariats, ainsi que l’augmentation de la participation et le volume des données. Finalement, au fil du temps on a vu une évolution des données et de ses différentes utilisations, et ce qu’eBird représente comme infrastructure.
Resumo:
Model predictive control (MPC) has often been referred to in literature as a potential method for more efficient control of building heating systems. Though a significant performance improvement can be achieved with an MPC strategy, the complexity introduced to the commissioning of the system is often prohibitive. Models are required which can capture the thermodynamic properties of the building with sufficient accuracy for meaningful predictions to be made. Furthermore, a large number of tuning weights may need to be determined to achieve a desired performance. For MPC to become a practicable alternative, these issues must be addressed. Acknowledging the impact of the external environment as well as the interaction of occupants on the thermal behaviour of the building, in this work, techniques have been developed for deriving building models from data in which large, unmeasured disturbances are present. A spatio-temporal filtering process was introduced to determine estimates of the disturbances from measured data, which were then incorporated with metaheuristic search techniques to derive high-order simulation models, capable of replicating the thermal dynamics of a building. While a high-order simulation model allowed for control strategies to be analysed and compared, low-order models were required for use within the MPC strategy itself. The disturbance estimation techniques were adapted for use with system-identification methods to derive such models. MPC formulations were then derived to enable a more straightforward commissioning process and implemented in a validated simulation platform. A prioritised-objective strategy was developed which allowed for the tuning parameters typically associated with an MPC cost function to be omitted from the formulation by separation of the conflicting requirements of comfort satisfaction and energy reduction within a lexicographic framework. The improved ability of the formulation to be set-up and reconfigured in faulted conditions was shown.
Resumo:
We propose a method denoted as synthetic portfolio for event studies in market microstructure that is particularly interesting to use with high frequency data and thinly traded markets. The method is based on Synthetic Control Method and provides a robust data driven method to build a counterfactual for evaluating the effects of the volatility call auctions. We find that SMC could be used if the loss function is defined as the difference between the returns of the asset and the returns of a synthetic portfolio. We apply SCM to test the performance of the volatility call auction as a circuit breaker in the context of an event study. We find that for Colombian Stock Market securities, the asynchronicity of intraday data reduces the analysis to a selected group of stocks, however it is possible to build a tracking portfolio. The realized volatility increases after the auction, indicating that the mechanism is not enhancing the price discovery process.
Resumo:
This thesis studies how commercial practice is developing with artificial intelligence (AI) technologies and discusses some normative concepts in EU consumer law. The author analyses the phenomenon of 'algorithmic business', which defines the increasing use of data-driven AI in marketing organisations for the optimisation of a range of consumer-related tasks. The phenomenon is orienting business-consumer relations towards some general trends that influence power and behaviors of consumers. These developments are not taking place in a legal vacuum, but against the background of a normative system aimed at maintaining fairness and balance in market transactions. The author assesses current developments in commercial practices in the context of EU consumer law, which is specifically aimed at regulating commercial practices. The analysis is critical by design and without neglecting concrete practices tries to look at the big picture. The thesis consists of nine chapters divided in three thematic parts. The first part discusses the deployment of AI in marketing organisations, a brief history, the technical foundations, and their modes of integration in business organisations. In the second part, a selected number of socio-technical developments in commercial practice are analysed. The following are addressed: the monitoring and analysis of consumers’ behaviour based on data; the personalisation of commercial offers and customer experience; the use of information on consumers’ psychology and emotions, the mediation through marketing conversational applications. The third part assesses these developments in the context of EU consumer law and of the broader policy debate concerning consumer protection in the algorithmic society. In particular, two normative concepts underlying the EU fairness standard are analysed: manipulation, as a substantive regulatory standard that limits commercial behaviours in order to protect consumers’ informed and free choices and vulnerability, as a concept of social policy that portrays people who are more exposed to marketing practices.
Resumo:
This dissertation explores the link between hate crimes that occurred in the United Kingdom in June 2017, June 2018 and June 2019 through the posts of a robust sample of Conservative and radical right users on Twitter. In order to avoid the traditional challenges of this kind of research, I adopted a four staged research protocol that enabled me to merge content produced by a group of randomly selected users to observe the phenomenon from different angles. I collected tweets from thirty Conservative/right wing accounts for each month of June over the three years with the help of programming languages such as Python and CygWin tools. I then examined the language of my data focussing on humorous content in order to reveal whether, and if so how, radical users online often use humour as a tool to spread their views in conditions of heightened disgust and wide-spread political instability. A reflection on humour as a moral occurrence, expanding on the works of Christie Davies as well as applying recent findings on the behavioural immune system on online data, offers new insights on the overlooked humorous nature of radical political discourse. An unorthodox take on the moral foundations pioneered by Jonathan Haidt enriched my understanding of the analysed material through the addition of a moral-based layer of enquiry to my more traditional content-based one. This convergence of theoretical, data driven and real life events constitutes a viable “collection of strategies” for academia, data scientists; NGO’s fighting hate crimes and the wider public alike. Bringing together the ideas of Davies, Haidt and others to my data, helps us to perceive humorous online content in terms of complex radical narratives that are all too often compressed into a single tweet.
Resumo:
The inferior alveolar nerve (IAN) lies within the mandibular canal, named inferior alveolar canal in literature. The detection of this nerve is important during maxillofacial surgeries or for creating dental implants. The poor quality of cone-beam computed tomography (CBCT) and computed tomography (CT) scans and/or bone gaps within the mandible increase the difficulty of this task, posing a challenge to human experts who are going to manually detect it and resulting in a time-consuming task.Therefore this thesis investigates two methods to automatically detect the IAN: a non-data driven technique and a deep-learning method. The latter tracks the IAN position at each frame leveraging detections obtained with the deep neural network CenterNet, fined-tuned for our task, and temporal and spatial information.
Resumo:
This Thesis is composed of a collection of works written in the period 2019-2022, whose aim is to find methodologies of Artificial Intelligence (AI) and Machine Learning to detect and classify patterns and rules in argumentative and legal texts. We define our approach “hybrid”, since we aimed at designing hybrid combinations of symbolic and sub-symbolic AI, involving both “top-down” structured knowledge and “bottom-up” data-driven knowledge. A first group of works is dedicated to the classification of argumentative patterns. Following the Waltonian model of argument and the related theory of Argumentation Schemes, these works focused on the detection of argumentative support and opposition, showing that argumentative evidences can be classified at fine-grained levels without resorting to highly engineered features. To show this, our methods involved not only traditional approaches such as TFIDF, but also some novel methods based on Tree Kernel algorithms. After the encouraging results of this first phase, we explored the use of a some emerging methodologies promoted by actors like Google, which have deeply changed NLP since 2018-19 — i.e., Transfer Learning and language models. These new methodologies markedly improved our previous results, providing us with best-performing NLP tools. Using Transfer Learning, we also performed a Sequence Labelling task to recognize the exact span of argumentative components (i.e., claims and premises), thus connecting portions of natural language to portions of arguments (i.e., to the logical-inferential dimension). The last part of our work was finally dedicated to the employment of Transfer Learning methods for the detection of rules and deontic modalities. In this case, we explored a hybrid approach which combines structured knowledge coming from two LegalXML formats (i.e., Akoma Ntoso and LegalRuleML) with sub-symbolic knowledge coming from pre-trained (and then fine-tuned) neural architectures.
Resumo:
Inverse problems are at the core of many challenging applications. Variational and learning models provide estimated solutions of inverse problems as the outcome of specific reconstruction maps. In the variational approach, the result of the reconstruction map is the solution of a regularized minimization problem encoding information on the acquisition process and prior knowledge on the solution. In the learning approach, the reconstruction map is a parametric function whose parameters are identified by solving a minimization problem depending on a large set of data. In this thesis, we go beyond this apparent dichotomy between variational and learning models and we show they can be harmoniously merged in unified hybrid frameworks preserving their main advantages. We develop several highly efficient methods based on both these model-driven and data-driven strategies, for which we provide a detailed convergence analysis. The arising algorithms are applied to solve inverse problems involving images and time series. For each task, we show the proposed schemes improve the performances of many other existing methods in terms of both computational burden and quality of the solution. In the first part, we focus on gradient-based regularized variational models which are shown to be effective for segmentation purposes and thermal and medical image enhancement. We consider gradient sparsity-promoting regularized models for which we develop different strategies to estimate the regularization strength. Furthermore, we introduce a novel gradient-based Plug-and-Play convergent scheme considering a deep learning based denoiser trained on the gradient domain. In the second part, we address the tasks of natural image deblurring, image and video super resolution microscopy and positioning time series prediction, through deep learning based methods. We boost the performances of supervised, such as trained convolutional and recurrent networks, and unsupervised deep learning strategies, such as Deep Image Prior, by penalizing the losses with handcrafted regularization terms.
Resumo:
In the last decades, Artificial Intelligence has witnessed multiple breakthroughs in deep learning. In particular, purely data-driven approaches have opened to a wide variety of successful applications due to the large availability of data. Nonetheless, the integration of prior knowledge is still required to compensate for specific issues like lack of generalization from limited data, fairness, robustness, and biases. In this thesis, we analyze the methodology of integrating knowledge into deep learning models in the field of Natural Language Processing (NLP). We start by remarking on the importance of knowledge integration. We highlight the possible shortcomings of these approaches and investigate the implications of integrating unstructured textual knowledge. We introduce Unstructured Knowledge Integration (UKI) as the process of integrating unstructured knowledge into machine learning models. We discuss UKI in the field of NLP, where knowledge is represented in a natural language format. We identify UKI as a complex process comprised of multiple sub-processes, different knowledge types, and knowledge integration properties to guarantee. We remark on the challenges of integrating unstructured textual knowledge and bridge connections with well-known research areas in NLP. We provide a unified vision of structured knowledge extraction (KE) and UKI by identifying KE as a sub-process of UKI. We investigate some challenging scenarios where structured knowledge is not a feasible prior assumption and formulate each task from the point of view of UKI. We adopt simple yet effective neural architectures and discuss the challenges of such an approach. Finally, we identify KE as a form of symbolic representation. From this perspective, we remark on the need of defining sophisticated UKI processes to verify the validity of knowledge integration. To this end, we foresee frameworks capable of combining symbolic and sub-symbolic representations for learning as a solution.
Resumo:
The research project is focused on the investigation of the polymorphism of crystalline molecular material for organic semiconductor applications under non-ambient conditions, and the solid-state characterization and crystal structure determination of the different polymorphic forms. In particular, this research project has tackled the investigation and characterization of the polymorphism of perylene diimides (PDIs) derivatives at high temperatures and pressures, in particular N,N’-dialkyl-3,4,9,10-perylendiimide (PDI-Cn, with n = 5, 6, 7, 8). These molecules are characterized by excellent chemical, thermal, and photostability, high electron affinity, strong absorption in the visible region, low LUMO energies, good air stability, and good charge transport properties, which can be tuned via functionalization; these features make them promising n-type organic semiconductor materials for several applications such as OFETs, OPV cells, laser dye, sensors, bioimaging, etc. The thermal characterization of PDI-Cn was carried out by a combination of differential scanning calorimetry, variable temperature X-ray diffraction, hot-stage microscopy, and in the case of PDI-C5 also variable temperature Raman spectroscopy. Whereas crystal structure determination was carried out by both Single Crystal and Powder X-ray diffraction. Moreover, high-pressure polymorphism via pressure-dependent UV-Vis absorption spectroscopy and high-pressure Single Crystal X-ray diffraction was carried out in this project. A data-driven approach based on a combination of self-organizing maps (SOM) and principal component analysis (PCA) is also reported was used to classify different π-stacking arrangements of PDI derivatives into families of similar crystal packing. Besides the main project, in the framework of structure-property analysis under non-ambient conditions, the structural investigation of the water loss in Pt- and Pd- based vapochromic potassium/lithium salts upon temperature, and the investigation of structure-mechanical property relationships in polymorphs of a thienopyrrolyldione endcapped oligothiophene (C4-NT3N) are reported.
Resumo:
In the framework of industrial problems, the application of Constrained Optimization is known to have overall very good modeling capability and performance and stands as one of the most powerful, explored, and exploited tool to address prescriptive tasks. The number of applications is huge, ranging from logistics to transportation, packing, production, telecommunication, scheduling, and much more. The main reason behind this success is to be found in the remarkable effort put in the last decades by the OR community to develop realistic models and devise exact or approximate methods to solve the largest variety of constrained or combinatorial optimization problems, together with the spread of computational power and easily accessible OR software and resources. On the other hand, the technological advancements lead to a data wealth never seen before and increasingly push towards methods able to extract useful knowledge from them; among the data-driven methods, Machine Learning techniques appear to be one of the most promising, thanks to its successes in domains like Image Recognition, Natural Language Processes and playing games, but also the amount of research involved. The purpose of the present research is to study how Machine Learning and Constrained Optimization can be used together to achieve systems able to leverage the strengths of both methods: this would open the way to exploiting decades of research on resolution techniques for COPs and constructing models able to adapt and learn from available data. In the first part of this work, we survey the existing techniques and classify them according to the type, method, or scope of the integration; subsequently, we introduce a novel and general algorithm devised to inject knowledge into learning models through constraints, Moving Target. In the last part of the thesis, two applications stemming from real-world projects and done in collaboration with Optit will be presented.
Resumo:
Biobanks are key infrastructures in data-driven biomedical research. The counterpoint of this optimistic vision is the reality of biobank governance, which must address various ethical, legal and social issues, especially in terms of open consent, privacy and secondary uses which, if not sufficiently resolved, may undermine participants’ and society’s trust in biobanking. The effect of the digital paradigm on biomedical research has only accentuated these issues by adding new pressure for the data protection of biobank participants against the risks of covert discrimination, abuse of power against individuals and groups, and critical commercial uses. Moreover, the traditional research-ethics framework has been unable to keep pace with the transformative developments of the digital era, and has proven inadequate in protecting biobank participants and providing guidance for ethical practices. To this must be added the challenge of an increased tendency towards exploitation and the commercialisation of personal data in the field of biomedical research, which may undermine the altruistic and solidaristic values associated with biobank participation and risk losing alignment with societal interests in biobanking. My research critically analyses, from a bioethical perspective, the challenges and the goals of biobank governance in data-driven biomedical research in order to understand the conditions for the implementation of a governance model that can foster biomedical research and innovation, while ensuring adequate protection for biobank participants and an alignment of biobank procedures and policies with society’s interests and expectations. The main outcome is a conceptualisation of a socially-oriented and participatory model of biobanks by proposing a new ethical framework that relies on the principles of transparency, data protection and participation to tackle the key challenges of biobanks in the digital age and that is well-suited to foster these goals.
Resumo:
In this thesis we focus on the analysis and interpretation of time dependent deformations recorded through different geodetic methods. Firstly, we apply a variational Bayesian Independent Component Analysis (vbICA) technique to GPS daily displacement solutions, to separate the postseismic deformation that followed the mainshocks of the 2016-2017 Central Italy seismic sequence from the other, hydrological, deformation sources. By interpreting the signal associated with the postseismic relaxation, we model an afterslip distribution on the faults involved by the mainshocks consistent with the co-seismic models available in literature. We find evidences of aseismic slip on the Paganica fault, responsible for the Mw 6.1 2009 L’Aquila earthquake, highlighting the importance of aseismic slip and static stress transfer to properly model the recurrence of earthquakes on nearby fault segments. We infer a possible viscoelastic relaxation of the lower crust as a contributing mechanism to the postseismic displacements. We highlight the importance of a proper separation of the hydrological signals for an accurate assessment of the tectonic processes, especially in cases of mm-scale deformations. Contextually, we provide a physical explanation to the ICs associated with the observed hydrological processes. In the second part of the thesis, we focus on strain data from Gladwin Tensor Strainmeters, working on the instruments deployed in Taiwan. We develop a novel approach, completely data driven, to calibrate these strainmeters. We carry out a joint analysis of geodetic (strainmeters, GPS and GRACE products) and hydrological (rain gauges and piezometers) data sets, to characterize the hydrological signals in Southern Taiwan. Lastly, we apply the calibration approach here proposed to the strainmeters recently installed in Central Italy. We provide, as an example, the detection of a storm that hit the Umbria-Marche regions (Italy), demonstrating the potential of strainmeters in following the dynamics of deformation processes with limited spatio-temporal signature
Resumo:
The integration of quantitative data from movement analysis technologies is reshaping the analysis of athletes’ performances and injury mitigation, e.g., anterior cruciate ligament (ACL) rupture. Most of the movement assessments are performed in laboratory environments. Recent progress provides the chance to shift the paradigm to a more ecological approach with sport-specific elements and a closer examination of “real” movement patterns associated with performance and (ACL) injury risk. The present PhD thesis aimed at investigating the on-field motion patterns related to performance and injury prevention in young football players. The objectives of the thesis were: (I) in-lab measures of high-dynamics movements were used to validate wearable inertial sensors technology; (II) in-laboratory and on-field agility movement tasks were compared to inspect the effect of football-specific environment; (III) on-field analysis was conducted to challenge wearable sensors technology in the assessment of dangerous movement patterns towards the ACL rupture; (IV) an overview of technologies that could shape present and future assessment of ACL injury risk in daily practice was presented. The validity of wearables in the assessment of high-dynamics movements was confirmed. Relevant differences emerged between the movements performed in a laboratory setting and on the football pitch, supporting the inclusion of an ecological dynamics approach in preventive protocols. The on-field analysis of football-specific movement tasks demonstrated good reliability of wearable sensors and the presence of residual dangerous patterns in the injured players. A tool to inspect at-risk movement patterns on the field through objective measurements was presented. It discussed how potential alternatives to wearable inertial sensors embrace artificial intelligence and closer collaboration between clinical and technical expertise. The present thesis was meant to contribute to setting the basis for data-driven prevention protocols. A deeper comprehension of injury-related principles and counteractions will contribute to preserving athletes’ careers and health over time.
Resumo:
Protected crop production is a modern and innovative approach to cultivating plants in a controlled environment to optimize growth, yield, and quality. This method involves using structures such as greenhouses or tunnels to create a sheltered environment. These productive solutions are characterized by a careful regulation of variables like temperature, humidity, light, and ventilation, which collectively contribute to creating an optimal microclimate for plant growth. Heating, cooling, and ventilation systems are used to maintain optimal conditions for plant growth, regardless of external weather fluctuations. Protected crop production plays a crucial role in addressing challenges posed by climate variability, population growth, and food security. Similarly, animal husbandry involves providing adequate nutrition, housing, medical care and environmental conditions to ensure animal welfare. Then, sustainability is a critical consideration in all forms of agriculture, including protected crop and animal production. Sustainability in animal production refers to the practice of producing animal products in a way that minimizes negative impacts on the environment, promotes animal welfare, and ensures the long-term viability of the industry. Then, the research activities performed during the PhD can be inserted exactly in the field of Precision Agriculture and Livestock farming. Here the focus is on the computational fluid dynamic (CFD) approach and environmental assessment applied to improve yield, resource efficiency, environmental sustainability, and cost savings. It represents a significant shift from traditional farming methods to a more technology-driven, data-driven, and environmentally conscious approach to crop and animal production. On one side, CFD is powerful and precise techniques of computer modeling and simulation of airflows and thermo-hygrometric parameters, that has been applied to optimize the growth environment of crops and the efficiency of ventilation in pig barns. On the other side, the sustainability aspect has been investigated and researched in terms of Life Cycle Assessment analyses.