21 resultados para Data classification
Resumo:
In recent years, there has been exponential growth in using virtual spaces, including dialogue systems, that handle personal information. The concept of personal privacy in the literature is discussed and controversial, whereas, in the technological field, it directly influences the degree of reliability perceived in the information system (privacy ‘as trust’). This work aims to protect the right to privacy on personal data (GDPR, 2018) and avoid the loss of sensitive content by exploring sensitive information detection (SID) task. It is grounded on the following research questions: (RQ1) What does sensitive data mean? How to define a personal sensitive information domain? (RQ2) How to create a state-of-the-art model for SID?(RQ3) How to evaluate the model? RQ1 theoretically investigates the concepts of privacy and the ontological state-of-the-art representation of personal information. The Data Privacy Vocabulary (DPV) is the taxonomic resource taken as an authoritative reference for the definition of the knowledge domain. Concerning RQ2, we investigate two approaches to classify sensitive data: the first - bottom-up - explores automatic learning methods based on transformer networks, the second - top-down - proposes logical-symbolic methods with the construction of privaframe, a knowledge graph of compositional frames representing personal data categories. Both approaches are tested. For the evaluation - RQ3 – we create SPeDaC, a sentence-level labeled resource. This can be used as a benchmark or training in the SID task, filling the gap of a shared resource in this field. If the approach based on artificial neural networks confirms the validity of the direction adopted in the most recent studies on SID, the logical-symbolic approach emerges as the preferred way for the classification of fine-grained personal data categories, thanks to the semantic-grounded tailor modeling it allows. At the same time, the results highlight the strong potential of hybrid architectures in solving automatic tasks.
Resumo:
The dissertation addresses the still not solved challenges concerned with the source-based digital 3D reconstruction, visualisation and documentation in the domain of archaeology, art and architecture history. The emerging BIM methodology and the exchange data format IFC are changing the way of collaboration, visualisation and documentation in the planning, construction and facility management process. The introduction and development of the Semantic Web (Web 3.0), spreading the idea of structured, formalised and linked data, offers semantically enriched human- and machine-readable data. In contrast to civil engineering and cultural heritage, academic object-oriented disciplines, like archaeology, art and architecture history, are acting as outside spectators. Since the 1990s, it has been argued that a 3D model is not likely to be considered a scientific reconstruction unless it is grounded on accurate documentation and visualisation. However, these standards are still missing and the validation of the outcomes is not fulfilled. Meanwhile, the digital research data remain ephemeral and continue to fill the growing digital cemeteries. This study focuses, therefore, on the evaluation of the source-based digital 3D reconstructions and, especially, on uncertainty assessment in the case of hypothetical reconstructions of destroyed or never built artefacts according to scientific principles, making the models shareable and reusable by a potentially wide audience. The work initially focuses on terminology and on the definition of a workflow especially related to the classification and visualisation of uncertainty. The workflow is then applied to specific cases of 3D models uploaded to the DFG repository of the AI Mainz. In this way, the available methods of documenting, visualising and communicating uncertainty are analysed. In the end, this process will lead to a validation or a correction of the workflow and the initial assumptions, but also (dealing with different hypotheses) to a better definition of the levels of uncertainty.
Resumo:
The Workflow activity was the following: Preliminary phase: Identification of 18 Formalin-fixed paraffin embedded (FFPE) samples (9 patients) («matched» 9 AK lesions and 9 SCC lesions). Working on biopsies samples we perform an extraction and RNA analysis with droplet Digital PCR (ddPCR) and we perform the data analysis. Second and final step phase: Evaluation of additional 39 subjects (36 men and 3 women). Results: We perform an evaluation and comparison of the following miRNA: miR-320 (a miRNA involved in apoptosis and cell proliferation control; miR-204, a miRNA involved in cell proliferation in and miRNA-16-5p, a miRNA involved in apoptosis).Conclusion: Our data suggest that there is no significant variation in the expression of the three tested microRNAs between adjacent AK lesions and squamous-cell carcinoma. However, a relevant trend has been observed Furthermore, by evaluating the miRNA expression trend between keratosis and carcinoma of the same patient, it is observed that there is no "uniform trend": for some samples the expression rises for the transition from AK to SCC and viceversa.
Resumo:
In this work, we explore and demonstrate the potential for modeling and classification using quantile-based distributions, which are random variables defined by their quantile function. In the first part we formalize a least squares estimation framework for the class of linear quantile functions, leading to unbiased and asymptotically normal estimators. Among the distributions with a linear quantile function, we focus on the flattened generalized logistic distribution (fgld), which offers a wide range of distributional shapes. A novel naïve-Bayes classifier is proposed that utilizes the fgld estimated via least squares, and through simulations and applications, we demonstrate its competitiveness against state-of-the-art alternatives. In the second part we consider the Bayesian estimation of quantile-based distributions. We introduce a factor model with independent latent variables, which are distributed according to the fgld. Similar to the independent factor analysis model, this approach accommodates flexible factor distributions while using fewer parameters. The model is presented within a Bayesian framework, an MCMC algorithm for its estimation is developed, and its effectiveness is illustrated with data coming from the European Social Survey. The third part focuses on depth functions, which extend the concept of quantiles to multivariate data by imposing a center-outward ordering in the multivariate space. We investigate the recently introduced integrated rank-weighted (IRW) depth function, which is based on the distribution of random spherical projections of the multivariate data. This depth function proves to be computationally efficient and to increase its flexibility we propose different methods to explicitly model the projected univariate distributions. Its usefulness is shown in classification tasks: the maximum depth classifier based on the IRW depth is proven to be asymptotically optimal under certain conditions, and classifiers based on the IRW depth are shown to perform well in simulated and real data experiments.
Resumo:
The abundance of visual data and the push for robust AI are driving the need for automated visual sensemaking. Computer Vision (CV) faces growing demand for models that can discern not only what images "represent," but also what they "evoke." This is a demand for tools mimicking human perception at a high semantic level, categorizing images based on concepts like freedom, danger, or safety. However, automating this process is challenging due to entropy, scarcity, subjectivity, and ethical considerations. These challenges not only impact performance but also underscore the critical need for interoperability. This dissertation focuses on abstract concept-based (AC) image classification, guided by three technical principles: situated grounding, performance enhancement, and interpretability. We introduce ART-stract, a novel dataset of cultural images annotated with ACs, serving as the foundation for a series of experiments across four key domains: assessing the effectiveness of the end-to-end DL paradigm, exploring cognitive-inspired semantic intermediaries, incorporating cultural and commonsense aspects, and neuro-symbolic integration of sensory-perceptual data with cognitive-based knowledge. Our results demonstrate that integrating CV approaches with semantic technologies yields methods that surpass the current state of the art in AC image classification, outperforming the end-to-end deep vision paradigm. The results emphasize the role semantic technologies can play in developing both effective and interpretable systems, through the capturing, situating, and reasoning over knowledge related to visual data. Furthermore, this dissertation explores the complex interplay between technical and socio-technical factors. By merging technical expertise with an understanding of human and societal aspects, we advocate for responsible labeling and training practices in visual media. These insights and techniques not only advance efforts in CV and explainable artificial intelligence but also propel us toward an era of AI development that harmonizes technical prowess with deep awareness of its human and societal implications.
Resumo:
Introduction: Recently, the American Association of Gynecologic Laparoscopists proposed a new classification and scoring system with the specific aim to assess surgical complexity. This study sought to assess if a higher AAGL score correlates with an increased risk of peri-operative complications in women submitted to surgery for endometriosis. Methods: This is a retrospective cohort study conducted in a third level referral center. We collected data from women with endometriosis submitted to complete surgical removal of endometriosis from January 2019 to December 2021. ENZIAN, r-ASRM classifications and AAGL total score was calculated for each patient. Population was divided in two groups according to the occurrence or not of at least one peri-operative complication. Our primary outcome was to evaluate the correlation between AAGL score and occurrence of complications. Results: During the study period we analyzed data from 282 eligible patients. Among them, 80 (28.4%) experienced peri-operative complications. No statistically significant difference was found between the two groups in terms of baseline characteristics, except for pre-operative hemoglobin (Hb), which was lower in patients with complications (p=0.001). Surgical variables associated with the occurrence of complications were recto-sigmoid surgery (p=0.003), ileocecal resection (0.034), and longer operative time (p=0.007). Furthermore, a higher ENZIAN B score (p=0.006), AAGL score (p=0.045) and stage (p=0.022) were found in the group of patients with complications. The multivariate analysis only confirmed the significant association between the occurrence of peri-operative complications and lower pre-operative Hb level (OR 0.74; 95% CI, 0.59 - 0.94; p=0.014), longer operative time (OR 1.00; 95% CI, 1.00 – 1.01; p=0.013), recto-sigmoid surgery - especially discoid resection (OR 8.73; 95% CI, 2.18 – 35; p=0.016) and ENZIAN B3 (OR 3.62; 95% CI, 1.46 – 8.99; p= 0.006). Conclusion: According to our findings, high AAGL scores or stages do not seem to increase the risk of peri-operative complications.