893 resultados para data driven approach
Resumo:
Cette recherche explore comment l’infrastructure et les utilisations d’eBird, l’un des plus grands projets de science citoyenne dans le monde, se développent et évoluent dans le temps et l’espace. Nous nous concentrerons sur le travail d’eBird avec deux de ses partenaires latino-américains, le Mexique et le Pérou, chacun avec un portail Web géré par des organisations locales. eBird, qui est maintenant un grand réseau mondial de partenariats, donne occasion aux citoyens du monde entier la possibilité de contribuer à la science et à la conservation d’oiseaux à partir de ses observations téléchargées en ligne. Ces observations sont gérées et gardées dans une base de données qui est unifiée, globale et accessible pour tous ceux qui s’intéressent au sujet des oiseaux et sa conservation. De même, les utilisateurs profitent des fonctionnalités de la plateforme pour organiser et visualiser leurs données et celles d’autres. L’étude est basée sur une méthodologie qualitative à partir de l’observation des plateformes Web et des entrevues semi-structurées avec les membres du Laboratoire d’ornithologie de Cornell, l’équipe eBird et les membres des organisations partenaires locales responsables d’eBird Pérou et eBird Mexique. Nous analysons eBird comme une infrastructure qui prend en considération les aspects sociaux et techniques dans son ensemble, comme un tout. Nous explorons aussi à la variété de différents types d’utilisation de la plateforme et de ses données par ses divers utilisateurs. Trois grandes thématiques ressortent : l’importance de la collaboration comme une philosophie qui sous-tend le développement d’eBird, l’élargissement des relations et connexions d’eBird à travers ses partenariats, ainsi que l’augmentation de la participation et le volume des données. Finalement, au fil du temps on a vu une évolution des données et de ses différentes utilisations, et ce qu’eBird représente comme infrastructure.
Resumo:
Model predictive control (MPC) has often been referred to in literature as a potential method for more efficient control of building heating systems. Though a significant performance improvement can be achieved with an MPC strategy, the complexity introduced to the commissioning of the system is often prohibitive. Models are required which can capture the thermodynamic properties of the building with sufficient accuracy for meaningful predictions to be made. Furthermore, a large number of tuning weights may need to be determined to achieve a desired performance. For MPC to become a practicable alternative, these issues must be addressed. Acknowledging the impact of the external environment as well as the interaction of occupants on the thermal behaviour of the building, in this work, techniques have been developed for deriving building models from data in which large, unmeasured disturbances are present. A spatio-temporal filtering process was introduced to determine estimates of the disturbances from measured data, which were then incorporated with metaheuristic search techniques to derive high-order simulation models, capable of replicating the thermal dynamics of a building. While a high-order simulation model allowed for control strategies to be analysed and compared, low-order models were required for use within the MPC strategy itself. The disturbance estimation techniques were adapted for use with system-identification methods to derive such models. MPC formulations were then derived to enable a more straightforward commissioning process and implemented in a validated simulation platform. A prioritised-objective strategy was developed which allowed for the tuning parameters typically associated with an MPC cost function to be omitted from the formulation by separation of the conflicting requirements of comfort satisfaction and energy reduction within a lexicographic framework. The improved ability of the formulation to be set-up and reconfigured in faulted conditions was shown.
Resumo:
Objetivo: Identificar las barreras para la unificación de una Historia Clínica Electrónica –HCE- en Colombia. Materiales y Métodos: Se realizó un estudio cualitativo. Se realizaron entrevistas semiestructuradas a profesionales y expertos de 22 instituciones del sector salud, de Bogotá y de los departamentos de Cundinamarca, Santander, Antioquia, Caldas, Huila, Valle del Cauca. Resultados: Colombia se encuentra en una estructuración para la implementación de la Historia Clínica Electrónica Unificada -HCEU-. Actualmente, se encuentra en unificación en 42 IPSs públicas en el departamento de Cundinamarca, el desarrollo de la HCEU en el país es privado y de desarrollo propio debido a las necesidades particulares de cada IPS. Conclusiones: Se identificaron barreras humanas, financieras, legales, organizacionales, técnicas y profesionales en los departamentos entrevistados. Se identificó que la unificación de la HCE depende del acuerdo de voluntades entre las IPSs del sector público, privado, EPSs, y el Gobierno Nacional.
Resumo:
We propose a method denoted as synthetic portfolio for event studies in market microstructure that is particularly interesting to use with high frequency data and thinly traded markets. The method is based on Synthetic Control Method and provides a robust data driven method to build a counterfactual for evaluating the effects of the volatility call auctions. We find that SMC could be used if the loss function is defined as the difference between the returns of the asset and the returns of a synthetic portfolio. We apply SCM to test the performance of the volatility call auction as a circuit breaker in the context of an event study. We find that for Colombian Stock Market securities, the asynchronicity of intraday data reduces the analysis to a selected group of stocks, however it is possible to build a tracking portfolio. The realized volatility increases after the auction, indicating that the mechanism is not enhancing the price discovery process.
Resumo:
Las enfermedades huérfanas en Colombia, se definen como aquellas crónicamente debilitantes, que amenazan la vida, de baja prevalencia (menor 1/5000) y alta complejidad. Se estima que a nivel mundial existen entre 6000 a 8000 enfermedades raras diferentes(1). Varios países a nivel mundial individual o colectivamente, en los últimos años han creado políticas e incentivos para la investigación y protección de los pacientes con enfermedades raras. Sin embargo, a pesar del creciente número de publicaciones; la información sobre su etiología, fisiología, historia natural y datos epidemiológicos persiste escasa o ausente. Los registros de pacientes, son una valiosa herramienta para la caracterización de las enfermedades, su manejo y desenlaces con o sin tratamiento. Permiten mejorar políticas de salud pública y cuidado del paciente, contribuyendo a mejorar desenlaces sociales, económicos y de calidad de vida. En Colombia, bajo el decreto 1954 de 2012 y las resoluciones 3681 de 2013 y 0430 de 2013 se creó el fundamento legal para la creación de un registro nacional de enfermedades huérfanas. El presente estudio busca determinar la caracterización socio-demográfica y la prevalencia de las enfermedades huérfanas en Colombia en el periodo 2013. Métodos: Se realizó un estudio observacional de corte transversal de fuente secundaria sobre pacientes con enfermedades huérfanas en el territorio nacional; basándose en el registro nacional de enfermedades huérfanas obtenido por el Ministerio de Salud y Protección Social en el periodo 2013 bajo la normativa del decreto 1954 de 2012 y las resoluciones 3681 de 2013 y 0430 de 2013. Las bases de datos obtenidas fueron re-categorizadas en Excel versión 15.17 para la extracción de datos y su análisis estadístico posterior, fue realizado en el paquete estadístico para las ciencias sociales (SPSS v.20, Chicago, IL). Resultados: Se encontraron un total de 13173 pacientes con enfermedades huérfanas para el 2013. De estos, el 53.96% (7132) eran de género femenino y el 46.03% (6083) masculino; la mediana de la edad fue de 28 años con un rango inter-cuartil de 39 años, el 9% de los pacientes presentaron discapacidad. El registro contenía un total de 653 enfermedades huérfanas; el 34% del total de las enfermedades listadas en nuestro país (2). Las patologías más frecuentes fueron el Déficit Congénito del Factor VIII, Miastenia Grave, Enfermedad de Von Willebrand, Estatura Baja por Anomalía de Hormona de Crecimiento y Displasia Broncopulmonar. Discusión: Se estimó que aproximadamente 3.3 millones de colombianos debían tener una enfermedad huérfana para el 2013. El registro nacional logró recolectar datos de 13173 (0.4%). Este bajo número de pacientes, marca un importante sub-registro que se debe al uso de los códigos CIE-10, desconocimiento del personal de salud frente a las enfermedades huérfanas y clasificación errónea de los pacientes. Se encontraron un total de 653 enfermedades, un 34% de las enfermedades reportadas en el listado nacional de enfermedades huérfanas (2) y un 7% del total de enfermedades reportadas en ORPHANET para el periodo 2013 (3). Conclusiones: La recolección de datos y la sensibilización sobre las enfermedades huérfanas al personal de salud, es una estrategia de vital importancia para el diagnóstico temprano, medidas específicas de control e intervenciones de los pacientes. El identificar apropiadamente a los pacientes con este tipo de patologías, permite su ingreso en el registro y por ende mejora el sub-registro de datos. Sin embargo, cabe aclarar que el panorama ideal sería, el uso de un sistema de recolección diferente al CIE-10 y que abarque en mayor medida la totalidad de las enfermedades huérfanas.
Resumo:
Every construction process (whatever buildings, machines, software, etc.) requires first to make a model of the artifact that is going to be built. This model should be based on a paradigm or meta-model, which defines the basic modeling elements: which real world concepts can be represented, which relationships can be established among them, and son on. There also should be a language to represent, manipulate and think about that model. Usually this model should be redefined at various levels of abstraction. So both, the paradigm an the language, must have abstraction capacity. In this paper I characterize the relationships that exist between these concepts: model, language and abstraction. I also analyze some historical models, like the relational model for databases, the imperative programming model and the object oriented model. Finally, I remark the need to teach that model-driven approach to students, and even go further to higher level models, like component models o business models.
Resumo:
The language connectome was in-vivo investigated using multimodal non-invasive quantitative MRI. In PPA patients (n=18) recruited by the IRCCS ISNB, Bologna, cortical thickness measures showed a predominant reduction on the left hemisphere (p<0.005) with respect to matched healthy controls (HC) (n=18), and an accuracy of 86.1% in discrimination from Alzheimer’s disease patients (n=18). The left temporal and para-hippocampal gyri significantly correlated (p<0.01) with language fluency. In PPA patients (n=31) recruited by the Northwestern University Chicago, DTI measures were longitudinally evaluated (2-years follow-up) under the supervision of Prof. M. Catani, King’s College London. Significant differences with matched HC (n=27) were found, tract-localized at baseline and widespread in the follow-up. Language assessment scores correlated with arcuate (AF) and uncinate (UF) fasciculi DTI measures. In left-ischemic stroke patients (n=16) recruited by the NatBrainLab, King’s College London, language recovery was longitudinally evaluated (6-months follow-up). Using arterial spin labelling imaging a significant correlation (p<0.01) between language recovery and cerebral blood flow asymmetry, was found in the middle cerebral artery perfusion, towards the right. In HC (n=29) recruited by the DIBINEM Functional MR Unit, University of Bologna, an along-tract algorithm was developed suitable for different tractography methods, using the Laplacian operator. A higher left superior temporal gyrus and precentral operculum AF connectivity was found (Talozzi L et al., 2018), and lateralized UF projections towards the left dorsal orbital cortex. In HC (n=50) recruited in the Human Connectome Project, a new tractography-driven approach was developed for left association fibres, using a principal component analysis. The first component discriminated cortical areas typically connected by the AF, suggesting a good discrimination of cortical areas sharing a similar connectivity pattern. The evaluation of morphological, microstructural and metabolic measures could be used as in-vivo biomarkers to monitor language impairment related to neurodegeneration or as surrogate of cognitive rehabilitation/interventional treatment efficacy.
Resumo:
In the last decades, global food supply chains had to deal with the increasing awareness of the stakeholders and consumers about safety, quality, and sustainability. In order to address these new challenges for food supply chain systems, an integrated approach to design, control, and optimize product life cycle is required. Therefore, it is essential to introduce new models, methods, and decision-support platforms tailored to perishable products. This thesis aims to provide novel practice-ready decision-support models and methods to optimize the logistics of food items with an integrated and interdisciplinary approach. It proposes a comprehensive review of the main peculiarities of perishable products and the environmental stresses accelerating their quality decay. Then, it focuses on top-down strategies to optimize the supply chain system from the strategical to the operational decision level. Based on the criticality of the environmental conditions, the dissertation evaluates the main long-term logistics investment strategies to preserve products quality. Several models and methods are proposed to optimize the logistics decisions to enhance the sustainability of the supply chain system while guaranteeing adequate food preservation. The models and methods proposed in this dissertation promote a climate-driven approach integrating climate conditions and their consequences on the quality decay of products in innovative models supporting the logistics decisions. Given the uncertain nature of the environmental stresses affecting the product life cycle, an original stochastic model and solving method are proposed to support practitioners in controlling and optimizing the supply chain systems when facing uncertain scenarios. The application of the proposed decision-support methods to real case studies proved their effectiveness in increasing the sustainability of the perishable product life cycle. The dissertation also presents an industry application of a global food supply chain system, further demonstrating how the proposed models and tools can be integrated to provide significant savings and sustainability improvements.
Resumo:
This thesis studies how commercial practice is developing with artificial intelligence (AI) technologies and discusses some normative concepts in EU consumer law. The author analyses the phenomenon of 'algorithmic business', which defines the increasing use of data-driven AI in marketing organisations for the optimisation of a range of consumer-related tasks. The phenomenon is orienting business-consumer relations towards some general trends that influence power and behaviors of consumers. These developments are not taking place in a legal vacuum, but against the background of a normative system aimed at maintaining fairness and balance in market transactions. The author assesses current developments in commercial practices in the context of EU consumer law, which is specifically aimed at regulating commercial practices. The analysis is critical by design and without neglecting concrete practices tries to look at the big picture. The thesis consists of nine chapters divided in three thematic parts. The first part discusses the deployment of AI in marketing organisations, a brief history, the technical foundations, and their modes of integration in business organisations. In the second part, a selected number of socio-technical developments in commercial practice are analysed. The following are addressed: the monitoring and analysis of consumers’ behaviour based on data; the personalisation of commercial offers and customer experience; the use of information on consumers’ psychology and emotions, the mediation through marketing conversational applications. The third part assesses these developments in the context of EU consumer law and of the broader policy debate concerning consumer protection in the algorithmic society. In particular, two normative concepts underlying the EU fairness standard are analysed: manipulation, as a substantive regulatory standard that limits commercial behaviours in order to protect consumers’ informed and free choices and vulnerability, as a concept of social policy that portrays people who are more exposed to marketing practices.
Resumo:
This dissertation explores the link between hate crimes that occurred in the United Kingdom in June 2017, June 2018 and June 2019 through the posts of a robust sample of Conservative and radical right users on Twitter. In order to avoid the traditional challenges of this kind of research, I adopted a four staged research protocol that enabled me to merge content produced by a group of randomly selected users to observe the phenomenon from different angles. I collected tweets from thirty Conservative/right wing accounts for each month of June over the three years with the help of programming languages such as Python and CygWin tools. I then examined the language of my data focussing on humorous content in order to reveal whether, and if so how, radical users online often use humour as a tool to spread their views in conditions of heightened disgust and wide-spread political instability. A reflection on humour as a moral occurrence, expanding on the works of Christie Davies as well as applying recent findings on the behavioural immune system on online data, offers new insights on the overlooked humorous nature of radical political discourse. An unorthodox take on the moral foundations pioneered by Jonathan Haidt enriched my understanding of the analysed material through the addition of a moral-based layer of enquiry to my more traditional content-based one. This convergence of theoretical, data driven and real life events constitutes a viable “collection of strategies” for academia, data scientists; NGO’s fighting hate crimes and the wider public alike. Bringing together the ideas of Davies, Haidt and others to my data, helps us to perceive humorous online content in terms of complex radical narratives that are all too often compressed into a single tweet.
Resumo:
The inferior alveolar nerve (IAN) lies within the mandibular canal, named inferior alveolar canal in literature. The detection of this nerve is important during maxillofacial surgeries or for creating dental implants. The poor quality of cone-beam computed tomography (CBCT) and computed tomography (CT) scans and/or bone gaps within the mandible increase the difficulty of this task, posing a challenge to human experts who are going to manually detect it and resulting in a time-consuming task.Therefore this thesis investigates two methods to automatically detect the IAN: a non-data driven technique and a deep-learning method. The latter tracks the IAN position at each frame leveraging detections obtained with the deep neural network CenterNet, fined-tuned for our task, and temporal and spatial information.
Resumo:
Although the debate of what data science is has a long history and has not reached a complete consensus yet, Data Science can be summarized as the process of learning from data. Guided by the above vision, this thesis presents two independent data science projects developed in the scope of multidisciplinary applied research. The first part analyzes fluorescence microscopy images typically produced in life science experiments, where the objective is to count how many marked neuronal cells are present in each image. Aiming to automate the task for supporting research in the area, we propose a neural network architecture tuned specifically for this use case, cell ResUnet (c-ResUnet), and discuss the impact of alternative training strategies in overcoming particular challenges of our data. The approach provides good results in terms of both detection and counting, showing performance comparable to the interpretation of human operators. As a meaningful addition, we release the pre-trained model and the Fluorescent Neuronal Cells dataset collecting pixel-level annotations of where neuronal cells are located. In this way, we hope to help future research in the area and foster innovative methodologies for tackling similar problems. The second part deals with the problem of distributed data management in the context of LHC experiments, with a focus on supporting ATLAS operations concerning data transfer failures. In particular, we analyze error messages produced by failed transfers and propose a Machine Learning pipeline that leverages the word2vec language model and K-means clustering. This provides groups of similar errors that are presented to human operators as suggestions of potential issues to investigate. The approach is demonstrated on one full day of data, showing promising ability in understanding the message content and providing meaningful groupings, in line with previously reported incidents by human operators.
Resumo:
In the framework of industrial problems, the application of Constrained Optimization is known to have overall very good modeling capability and performance and stands as one of the most powerful, explored, and exploited tool to address prescriptive tasks. The number of applications is huge, ranging from logistics to transportation, packing, production, telecommunication, scheduling, and much more. The main reason behind this success is to be found in the remarkable effort put in the last decades by the OR community to develop realistic models and devise exact or approximate methods to solve the largest variety of constrained or combinatorial optimization problems, together with the spread of computational power and easily accessible OR software and resources. On the other hand, the technological advancements lead to a data wealth never seen before and increasingly push towards methods able to extract useful knowledge from them; among the data-driven methods, Machine Learning techniques appear to be one of the most promising, thanks to its successes in domains like Image Recognition, Natural Language Processes and playing games, but also the amount of research involved. The purpose of the present research is to study how Machine Learning and Constrained Optimization can be used together to achieve systems able to leverage the strengths of both methods: this would open the way to exploiting decades of research on resolution techniques for COPs and constructing models able to adapt and learn from available data. In the first part of this work, we survey the existing techniques and classify them according to the type, method, or scope of the integration; subsequently, we introduce a novel and general algorithm devised to inject knowledge into learning models through constraints, Moving Target. In the last part of the thesis, two applications stemming from real-world projects and done in collaboration with Optit will be presented.
Resumo:
Biobanks are key infrastructures in data-driven biomedical research. The counterpoint of this optimistic vision is the reality of biobank governance, which must address various ethical, legal and social issues, especially in terms of open consent, privacy and secondary uses which, if not sufficiently resolved, may undermine participants’ and society’s trust in biobanking. The effect of the digital paradigm on biomedical research has only accentuated these issues by adding new pressure for the data protection of biobank participants against the risks of covert discrimination, abuse of power against individuals and groups, and critical commercial uses. Moreover, the traditional research-ethics framework has been unable to keep pace with the transformative developments of the digital era, and has proven inadequate in protecting biobank participants and providing guidance for ethical practices. To this must be added the challenge of an increased tendency towards exploitation and the commercialisation of personal data in the field of biomedical research, which may undermine the altruistic and solidaristic values associated with biobank participation and risk losing alignment with societal interests in biobanking. My research critically analyses, from a bioethical perspective, the challenges and the goals of biobank governance in data-driven biomedical research in order to understand the conditions for the implementation of a governance model that can foster biomedical research and innovation, while ensuring adequate protection for biobank participants and an alignment of biobank procedures and policies with society’s interests and expectations. The main outcome is a conceptualisation of a socially-oriented and participatory model of biobanks by proposing a new ethical framework that relies on the principles of transparency, data protection and participation to tackle the key challenges of biobanks in the digital age and that is well-suited to foster these goals.
Resumo:
Imaging mass spectrometry (IMS) represents an innovative tool in the cancer research pipeline, which is increasingly being used in clinical and pharmaceutical applications. The unique properties of the technique, especially the amount of data generated, make the handling of data from multiple IMS acquisitions challenging. This work presents a histology-driven IMS approach aiming to identify discriminant lipid signatures from the simultaneous mining of IMS data sets from multiple samples. The feasibility of the developed workflow is evaluated on a set of three human colorectal cancer liver metastasis (CRCLM) tissue sections. Lipid IMS on tissue sections was performed using MALDI-TOF/TOF MS in both negative and positive ionization modes after 1,5-diaminonaphthalene matrix deposition by sublimation. The combination of both positive and negative acquisition results was performed during data mining to simplify the process and interrogate a larger lipidome into a single analysis. To reduce the complexity of the IMS data sets, a sub data set was generated by randomly selecting a fixed number of spectra from a histologically defined region of interest, resulting in a 10-fold data reduction. Principal component analysis confirmed that the molecular selectivity of the regions of interest is maintained after data reduction. Partial least-squares and heat map analyses demonstrated a selective signature of the CRCLM, revealing lipids that are significantly up- and down-regulated in the tumor region. This comprehensive approach is thus of interest for defining disease signatures directly from IMS data sets by the use of combinatory data mining, opening novel routes of investigation for addressing the demands of the clinical setting.