885 resultados para Semantic Publishing, Linked Data, Bibliometrics, Informetrics, Data Retrieval, Citations
Resumo:
Survival models are being widely applied to the engineering field to model time-to-event data once censored data is here a common issue. Using parametric models or not, for the case of heterogeneous data, they may not always represent a good fit. The present study relays on critical pumps survival data where traditional parametric regression might be improved in order to obtain better approaches. Considering censored data and using an empiric method to split the data into two subgroups to give the possibility to fit separated models to our censored data, we’ve mixture two distinct distributions according a mixture-models approach. We have concluded that it is a good method to fit data that does not fit to a usual parametric distribution and achieve reliable parameters. A constant cumulative hazard rate policy was used as well to check optimum inspection times using the obtained model from the mixture-model, which could be a plus when comparing with the actual maintenance policies to check whether changes should be introduced or not.
Resumo:
Most of the existing open-source search engines, utilize keyword or tf-idf based techniques to find relevant documents and web pages relative to an input query. Although these methods, with the help of a page rank or knowledge graphs, proved to be effective in some cases, they often fail to retrieve relevant instances for more complicated queries that would require a semantic understanding to be exploited. In this Thesis, a self-supervised information retrieval system based on transformers is employed to build a semantic search engine over the library of Gruppo Maggioli company. Semantic search or search with meaning can refer to an understanding of the query, instead of simply finding words matches and, in general, it represents knowledge in a way suitable for retrieval. We chose to investigate a new self-supervised strategy to handle the training of unlabeled data based on the creation of pairs of ’artificial’ queries and the respective positive passages. We claim that by removing the reliance on labeled data, we may use the large volume of unlabeled material on the web without being limited to languages or domains where labeled data is abundant.
Resumo:
In recent years, there has been exponential growth in using virtual spaces, including dialogue systems, that handle personal information. The concept of personal privacy in the literature is discussed and controversial, whereas, in the technological field, it directly influences the degree of reliability perceived in the information system (privacy ‘as trust’). This work aims to protect the right to privacy on personal data (GDPR, 2018) and avoid the loss of sensitive content by exploring sensitive information detection (SID) task. It is grounded on the following research questions: (RQ1) What does sensitive data mean? How to define a personal sensitive information domain? (RQ2) How to create a state-of-the-art model for SID?(RQ3) How to evaluate the model? RQ1 theoretically investigates the concepts of privacy and the ontological state-of-the-art representation of personal information. The Data Privacy Vocabulary (DPV) is the taxonomic resource taken as an authoritative reference for the definition of the knowledge domain. Concerning RQ2, we investigate two approaches to classify sensitive data: the first - bottom-up - explores automatic learning methods based on transformer networks, the second - top-down - proposes logical-symbolic methods with the construction of privaframe, a knowledge graph of compositional frames representing personal data categories. Both approaches are tested. For the evaluation - RQ3 – we create SPeDaC, a sentence-level labeled resource. This can be used as a benchmark or training in the SID task, filling the gap of a shared resource in this field. If the approach based on artificial neural networks confirms the validity of the direction adopted in the most recent studies on SID, the logical-symbolic approach emerges as the preferred way for the classification of fine-grained personal data categories, thanks to the semantic-grounded tailor modeling it allows. At the same time, the results highlight the strong potential of hybrid architectures in solving automatic tasks.
Resumo:
My doctoral research is about the modelling of symbolism in the cultural heritage domain, and on connecting artworks based on their symbolism through knowledge extraction and representation techniques. In particular, I participated in the design of two ontologies: one models the relationships between a symbol, its symbolic meaning, and the cultural context in which the symbol symbolizes the symbolic meaning; the second models artistic interpretations of a cultural heritage object from an iconographic and iconological (thus also symbolic) perspective. I also converted several sources of unstructured data, a dictionary of symbols and an encyclopaedia of symbolism, and semi-structured data, DBpedia and WordNet, to create HyperReal, the first knowledge graph dedicated to conventional cultural symbolism. By making use of HyperReal's content, I showed how linked open data about cultural symbolism could be utilized to initiate a series of quantitative studies that analyse (i) similarities between cultural contexts based on their symbologies, (ii) broad symbolic associations, (iii) specific case studies of symbolism such as the relationship between symbols, their colours, and their symbolic meanings. Moreover, I developed a system that can infer symbolic, cultural context-dependent interpretations from artworks according to what they depict, envisioning potential use cases for museum curation. I have then re-engineered the iconographic and iconological statements of Wikidata, a widely used general-domain knowledge base, creating ICONdata: an iconographic and iconological knowledge graph. ICONdata was then enriched with automatic symbolic interpretations. Subsequently, I demonstrated the significance of enhancing artwork information through alignment with linked open data related to symbolism, resulting in the discovery of novel connections between artworks. Finally, I contributed to the creation of a software application. This application leverages established connections, allowing users to investigate the symbolic expression of a concept across different cultural contexts through the generation of a three-dimensional exhibition of artefacts symbolising the chosen concept.
Resumo:
Artificial Intelligence is reshaping the field of fashion industry in different ways. E-commerce retailers exploit their data through AI to enhance their search engines, make outfit suggestions and forecast the success of a specific fashion product. However, it is a challenging endeavour as the data they possess is huge, complex and multi-modal. The most common way to search for fashion products online is by matching keywords with phrases in the product's description which are often cluttered, inadequate and differ across collections and sellers. A customer may also browse an online store's taxonomy, although this is time-consuming and doesn't guarantee relevant items. With the advent of Deep Learning architectures, particularly Vision-Language models, ad-hoc solutions have been proposed to model both the product image and description to solve this problems. However, the suggested solutions do not exploit effectively the semantic or syntactic information of these modalities, and the unique qualities and relations of clothing items. In this work of thesis, a novel approach is proposed to address this issues, which aims to model and process images and text descriptions as graphs in order to exploit the relations inside and between each modality and employs specific techniques to extract syntactic and semantic information. The results obtained show promising performances on different tasks when compared to the present state-of-the-art deep learning architectures.
Resumo:
L’Intelligenza Artificiale negli ultimi anni sta plasmando il futuro dell’umanità in quasi tutti i settori. È già il motore principale di diverse tecnologie emergenti come i big data, la robotica e l’IoT e continuerà ad agire come innovatore tecnologico nel futuro prossimo. Le recenti scoperte e migliorie sia nel campo dell’hardware che in quello matematico hanno migliorato l’efficienza e ridotto i tempi di esecuzione dei software. È in questo contesto che sta evolvendo anche il Natural Language Processing (NLP), un ramo dell’Intelligenza Artificiale che studia il modo in cui fornire ai computer l'abilità di comprendere un testo scritto o parlato allo stesso modo in cui lo farebbe un essere umano. Le ambiguità che distinguono la lingua naturale dalle altre rendono ardui gli studi in questo settore. Molti dei recenti sviluppi algoritmici su NLP si basano su tecnologie inventate decenni fa. La ricerca in questo settore è quindi in continua evoluzione. Questa tesi si pone l'obiettivo di sviluppare la logica di una chatbot help-desk per un'azienda privata. Lo scopo è, sottoposta una domanda da parte di un utente, restituire la risposta associata presente in una collezione domande-risposte. Il problema che questa tesi affronta è sviluppare un modello di NLP in grado di comprendere il significato semantico delle domande in input, poiché esse possono essere formulate in molteplici modi, preservando il contenuto semantico a discapito della sintassi. A causa delle ridotte dimensioni del dataset italiano proprietario su cui testare il modello chatbot, sono state eseguite molteplici sperimentazioni su un ulteriore dataset italiano con task affine. Attraverso diversi approcci di addestramento, tra cui apprendimento metrico, sono state raggiunte alte accuratezze sulle più comuni metriche di valutazione, confermando le capacità del modello proposto e sviluppato.
Resumo:
The aim of this master’s thesis is to study the risky situations of the cyclist when they interact with road infrastructure and other road users as well as the influence of speed on safety. This research activity is linked with the SAFERUP (Sustainable, Accessible, Resilient, and Smart Urban Pavement) European funded project where one of the doctoral candidate has performed experiments on the bicycle simulation at the Gustave Eiffel university in the PICS-L laboratory (Paris) and instrumented bicycle at the Stockholm (Sweden). The approach of the experiment was to hire a number of people who have participated in the riding of the Instrumented bicycle (Stockholm) and bicycle simulator (PICS-L) which were developed by attaching different sensors and devices to measure important parameters of the bicycle riding and their data was collected to analysis in order to understand the behavior of the cyclist to improve the safety. In addition, a mobile eye tracker wore by participants to record the real experiment scenario, and after the end of the trip, each participant shared their remarks regarding their experience of bicycle riding according to different portions of the road infrastructure. In this research main focus is to analyze the relevant data such as speed profiles, video recordings and questionnaire surveys from the instrumented bicycle experiment. In fact, critical situations, where there was a higher probability, were compared with the subjective evaluation of the participant to be conscious of the issues related to the safety and comfort of the cyclist in different road characteristics.
Resumo:
High-throughput screening of physical, genetic and chemical-genetic interactions brings important perspectives in the Systems Biology field, as the analysis of these interactions provides new insights into protein/gene function, cellular metabolic variations and the validation of therapeutic targets and drug design. However, such analysis depends on a pipeline connecting different tools that can automatically integrate data from diverse sources and result in a more comprehensive dataset that can be properly interpreted. We describe here the Integrated Interactome System (IIS), an integrative platform with a web-based interface for the annotation, analysis and visualization of the interaction profiles of proteins/genes, metabolites and drugs of interest. IIS works in four connected modules: (i) Submission module, which receives raw data derived from Sanger sequencing (e.g. two-hybrid system); (ii) Search module, which enables the user to search for the processed reads to be assembled into contigs/singlets, or for lists of proteins/genes, metabolites and drugs of interest, and add them to the project; (iii) Annotation module, which assigns annotations from several databases for the contigs/singlets or lists of proteins/genes, generating tables with automatic annotation that can be manually curated; and (iv) Interactome module, which maps the contigs/singlets or the uploaded lists to entries in our integrated database, building networks that gather novel identified interactions, protein and metabolite expression/concentration levels, subcellular localization and computed topological metrics, GO biological processes and KEGG pathways enrichment. This module generates a XGMML file that can be imported into Cytoscape or be visualized directly on the web. We have developed IIS by the integration of diverse databases following the need of appropriate tools for a systematic analysis of physical, genetic and chemical-genetic interactions. IIS was validated with yeast two-hybrid, proteomics and metabolomics datasets, but it is also extendable to other datasets. IIS is freely available online at: http://www.lge.ibi.unicamp.br/lnbio/IIS/.
Resumo:
The article seeks to investigate patterns of performance and relationships between grip strength, gait speed and self-rated health, and investigate the relationships between them, considering the variables of gender, age and family income. This was conducted in a probabilistic sample of community-dwelling elderly aged 65 and over, members of a population study on frailty. A total of 689 elderly people without cognitive deficit suggestive of dementia underwent tests of gait speed and grip strength. Comparisons between groups were based on low, medium and high speed and strength. Self-related health was assessed using a 5-point scale. The males and the younger elderly individuals scored significantly higher on grip strength and gait speed than the female and oldest did; the richest scored higher than the poorest on grip strength and gait speed; females and men aged over 80 had weaker grip strength and lower gait speed; slow gait speed and low income arose as risk factors for a worse health evaluation. Lower muscular strength affects the self-rated assessment of health because it results in a reduction in functional capacity, especially in the presence of poverty and a lack of compensatory factors.
Resumo:
Obstructive sleep apnea syndrome has a high prevalence among adults. Cephalometric variables can be a valuable method for evaluating patients with this syndrome. To correlate cephalometric data with the apnea-hypopnea sleep index. We performed a retrospective and cross-sectional study that analyzed the cephalometric data of patients followed in the Sleep Disorders Outpatient Clinic of the Discipline of Otorhinolaryngology of a university hospital, from June 2007 to May 2012. Ninety-six patients were included, 45 men, and 51 women, with a mean age of 50.3 years. A total of 11 patients had snoring, 20 had mild apnea, 26 had moderate apnea, and 39 had severe apnea. The distance from the hyoid bone to the mandibular plane was the only variable that showed a statistically significant correlation with the apnea-hypopnea index. Cephalometric variables are useful tools for the understanding of obstructive sleep apnea syndrome. The distance from the hyoid bone to the mandibular plane showed a statistically significant correlation with the apnea-hypopnea index.
Resumo:
In acquired immunodeficiency syndrome (AIDS) studies it is quite common to observe viral load measurements collected irregularly over time. Moreover, these measurements can be subjected to some upper and/or lower detection limits depending on the quantification assays. A complication arises when these continuous repeated measures have a heavy-tailed behavior. For such data structures, we propose a robust structure for a censored linear model based on the multivariate Student's t-distribution. To compensate for the autocorrelation existing among irregularly observed measures, a damped exponential correlation structure is employed. An efficient expectation maximization type algorithm is developed for computing the maximum likelihood estimates, obtaining as a by-product the standard errors of the fixed effects and the log-likelihood function. The proposed algorithm uses closed-form expressions at the E-step that rely on formulas for the mean and variance of a truncated multivariate Student's t-distribution. The methodology is illustrated through an application to an Human Immunodeficiency Virus-AIDS (HIV-AIDS) study and several simulation studies.
Resumo:
To assess the completeness and reliability of the Information System on Live Births (Sinasc) data. A cross-sectional analysis of the reliability and completeness of Sinasc's data was performed using a sample of Live Birth Certificate (LBC) from 2009, related to births from Campinas, Southeast Brazil. For data analysis, hospitals were grouped according to category of service (Unified National Health System, private or both), 600 LBCs were randomly selected and the data were collected in LBC-copies through mothers and newborns' hospital records and by telephone interviews. The completeness of LBCs was evaluated, calculating the percentage of blank fields, and the LBCs agreement comparing the originals with the copies was evaluated by Kappa and intraclass correlation coefficients. The percentage of completeness of LBCs ranged from 99.8%-100%. For the most items, the agreement was excellent. However, the agreement was acceptable for marital status, maternal education and newborn infants' race/color, low for prenatal visits and presence of birth defects, and very low for the number of deceased children. The results showed that the municipality Sinasc is reliable for most of the studied variables. Investments in training of the professionals are suggested in an attempt to improve system capacity to support planning and implementation of health activities for the benefit of maternal and child population.
Resumo:
Often in biomedical research, we deal with continuous (clustered) proportion responses ranging between zero and one quantifying the disease status of the cluster units. Interestingly, the study population might also consist of relatively disease-free as well as highly diseased subjects, contributing to proportion values in the interval [0, 1]. Regression on a variety of parametric densities with support lying in (0, 1), such as beta regression, can assess important covariate effects. However, they are deemed inappropriate due to the presence of zeros and/or ones. To evade this, we introduce a class of general proportion density, and further augment the probabilities of zero and one to this general proportion density, controlling for the clustering. Our approach is Bayesian and presents a computationally convenient framework amenable to available freeware. Bayesian case-deletion influence diagnostics based on q-divergence measures are automatic from the Markov chain Monte Carlo output. The methodology is illustrated using both simulation studies and application to a real dataset from a clinical periodontology study.
Resumo:
Patients with obstructive sleep apnea syndrome usually present with changes in upper airway morphology and/or body fat distribution, which may occur throughout life and increase the severity of obstructive sleep apnea syndrome with age. To correlate cephalometric and anthropometric measures with obstructive sleep apnea syndrome severity in different age groups. A retrospective study of cephalometric and anthropometric measures of 102 patients with obstructive sleep apnea syndrome was analyzed. Patients were divided into three age groups (≥20 and <40 years, ≥40 and <60 years, and ≥60 years). Pearson's correlation was performed for these measures with the apnea-hypopnea index in the full sample, and subsequently by age group. The cephalometric measures MP-H (distance between the mandibular plane and the hyoid bone) and PNS-P (distance between the posterior nasal spine and the tip of the soft palate) and the neck and waist circumferences showed a statistically significant correlation with apnea-hypopnea index in both the full sample and in the ≥40 and <60 years age group. These variables did not show any significant correlation with the other two age groups (<40 and ≥60 years). Cephalometric measurements MP-H and PNS-P and cervical and waist circumferences correlated with obstructive sleep apnea syndrome severity in patients in the ≥40 and <60 age group.
Resumo:
The syndrome of resistance to thyroid hormone (RTH β) is an inherited disorder characterized by variable tissue hyposensitivity to 3,5,30-l-triiodothyronine (T3), with persistent elevation of free-circulating T3 (FT3) and free thyroxine (FT4) levels in association with nonsuppressed serum thyrotropin (TSH). Clinical presentation is variable and the molecular analysis of THRB gene provides a short cut diagnosis. Here, we describe 2 cases in which RTH β was suspected on the basis of laboratory findings. The diagnosis was confirmed by direct THRB sequencing that revealed 2 novel mutations: the heterozygous p.Ala317Ser in subject 1 and the heterozygous p.Arg438Pro in subject 2. Both mutations were shown to be deleterious by SIFT, PolyPhen, and Align GV-GD predictive methods.