11 resultados para classification methods

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

60.00% 60.00%

Publicador:

Resumo:

The present study is part of the EU Integrated Project “GEHA – Genetics of Healthy Aging” (Franceschi C et al., Ann N Y Acad Sci. 1100: 21-45, 2007), whose aim is to identify genes involved in healthy aging and longevity, which allow individuals to survive to advanced age in good cognitive and physical function and in the absence of major age-related diseases. Aims The major aims of this thesis were the following: 1. to outline the recruitment procedure of 90+ Italian siblings performed by the recruiting units of the University of Bologna (UNIBO) and Rome (ISS). The procedures related to the following items necessary to perform the study were described and commented: identification of the eligible area for recruitment, demographic aspects related to the need of getting census lists of 90+siblings, mail and phone contact with 90+ subjects and their families, bioethics aspects of the whole procedure, standardization of the recruitment methodology and set-up of a detailed flow chart to be followed by the European recruitment centres (obtainment of the informed consent form, anonimization of data by using a special code, how to perform the interview, how to collect the blood, how to enter data in the GEHA Phenotypic Data Base hosted at Odense). 2. to provide an overview of the phenotypic characteristics of 90+ Italian siblings recruited by the recruiting units of the University of Bologna (UNIBO) and Rome (ISS). The following items were addressed: socio-demographic characteristics, health status, cognitive assessment, physical conditions (handgrip strength test, chair-stand test, physical ability including ADL, vision and hearing ability, movement ability and doing light housework), life-style information (smoking and drinking habits) and subjective well-being (attitude towards life). Moreover, haematological parameters collected in the 90+ sibpairs as optional parameters by the Bologna and Rome recruiting units were used for a more comprehensive evaluation of the results obtained using the above mentioned phenotypic characteristics reported in the GEHA questionnaire. 3. to assess 90+ Italian siblings as far as their health/functional status is concerned on the basis of three classification methods proposed in previous studies on centenarians, which are based on: • actual functional capabilities (ADL, SMMSE, visual and hearing abilities) (Gondo et al., J Gerontol. 61A (3): 305-310, 2006); • actual functional capabilities and morbidity (ADL, ability to walk, SMMSE, presence of cancer, ictus, renal failure, anaemia, and liver diseases) (Franceschi et al., Aging Clin Exp Res, 12:77-84, 2000); • retrospectively collected data about past history of morbidity and age of disease onset (hypertension, heart disease, diabetes, stroke, cancer, osteopororis, neurological diseases, chronic obstructive pulmonary disease and ocular diseases) (Evert et al., J Gerontol A Biol Sci Med Sci. 58A (3): 232-237, 2003). Firstly these available models to define the health status of long-living subjects were applied to the sample and, since the classifications by Gondo and Franceschi are both based on the present functional status, they were compared in order to better recognize the healthy aging phenotype and to identify the best group of 90+ subjects out of the entire studied population. 4. to investigate the concordance of health and functional status among 90+ siblings in order to divide sibpairs in three categories: the best (both sibs are in good shape), the worst (both sibs are in bad shape) and an intermediate group (one sib is in good shape and the other is in bad shape). Moreover, the evaluation wanted to discover which variables are concordant among siblings; thus, concordant variables could be considered as familiar variables (determined by the environment or by genetics). 5. to perform a survival analysis by using mortality data at 1st January 2009 from the follow-up as the main outcome and selected functional and clinical parameters as explanatory variables. Methods A total of 765 90+ Italian subjects recruited by UNIBO (549 90+ siblings, belonging to 258 families) and ISS (216 90+ siblings, belonging to 106 families) recruiting units are included in the analysis. Each subject was interviewed according to a standardized questionnaire, comprising extensively utilized questions that have been validated in previous European studies on elderly subjects and covering demographic information, life style, living conditions, cognitive status (SMMSE), mood, health status and anthropometric measurements. Moreover, subjects were asked to perform some physical tests (Hand Grip Strength test and Chair Standing test) and a sample of about 24 mL of blood was collected and then processed according to a common protocol for the preparation and storage of DNA aliquots. Results From the analysis the main findings are the following: - a standardized protocol to assess cognitive status, physical performances and health status of European nonagenarian subjects was set up, in respect to ethical requirements, and it is available as a reference for other studies in this field; - GEHA families are enriched in long-living members and extreme survival, and represent an appropriate model for the identification of genes involved in healthy aging and longevity; - two simplified sets of criteria to classify 90+ sibling according to their health status were proposed, as operational tools for distinguishing healthy from non healthy subjects; - cognitive and functional parameters have a major role in categorizing 90+ siblings for the health status; - parameters such as education and good physical abilities (500 metres walking ability, going up and down the stairs ability, high scores at hand grip and chair stand tests) are associated with a good health status (defined as “cognitive unimpairment and absence of disability”); - male nonagenarians show a more homogeneous phenotype than females, and, though far fewer in number, tend to be healthier than females; - in males the good health status is not protective for survival, confirming the male-female health survival paradox; - survival after age 90 was dependent mainly on intact cognitive status and absence of functional disabilities; - haemoglobin and creatinine levels are both associated with longevity; - the most concordant items among 90+ siblings are related to the functional status, indicating that they contain a familiar component. It is still to be investigated at what level this familiar component is determined by genetics or by environment or by the interaction between genetics, environment and chance (and at what level). Conclusions In conclusion, we could state that this study, in accordance with the main objectives of the whole GEHA project, represents one of the first attempt to identify the biological and non biological determinants of successful/unsuccessful aging and longevity. Here, the analysis was performed on 90+ siblings recruited in Northern and Central Italy and it could be used as a reference for others studies in this field on Italian population. Moreover, it contributed to the definition of “successful” and “unsuccessful” aging and categorising a very large cohort of our most elderly subjects into “successful” and “unsuccessful” groups provided an unrivalled opportunity to detect some of the basic genetic/molecular mechanisms which underpin good health as opposed to chronic disability. Discoveries in the topic of the biological determinants of healthy aging represent a real possibility to identify new markers to be utilized for the identification of subgroups of old European citizens having a higher risk to develop age-related diseases and disabilities and to direct major preventive medicine strategies for the new epidemic of chronic disease in the 21st century.

Relevância:

60.00% 60.00%

Publicador:

Resumo:

INTRODUCTION: Esophageal adenocarcinoma (EAC) is a severe malignancy in terms of prognosis and mortality rate. Because its great genetic heterogeneity, disputes regarding classification, prevention and treatments are still unsolved. AIM: We investigated intra- and inter-EAC heterogeneity by defining EAC’s somatic mutational profile and the role of candidate microRNAs, to correlate the molecular profile of tumors to clinical outcomes and to identify biomarkers for classification. METHODS: 38 EAC cases were analyzed via high-throughput cell sorting technology combined with targeted sequencing and whole genome low-pass sequencing. Targeted sequencing of further 169 cases was performed to widen the study. miR221 and miR483-3p expression was profiled via qPCR in 112 EACs and correlation with clinical outcomes was investigated. RESULTS: 35/38 EACs carried at least one somatic mutation absent in stromal cells. TP53 was found mutated in 73.7% of cases. Selective sorting revealed tumor subclones with different mutational loads and copy number alterations, confirming the high intra-tumor heterogeneity of EAC. Mutations were in most cases at homozygous state, and we identified alterations that were missed with the whole-tumor analysis. Mutations in HNF1A gene, not previously associated with EAC, were identified in both cohorts. Higher expression of miR483-3p and miR221 was associated with poorer cancer specific survival (P=0.0293 and P=0.0059), and recurrence in the Lauren intestinal subtype (P=0.0459 and P=0.0002). Median expression levels of miRNAs were higher in patients with advanced tumor stages. The loss of SMAD4 immunoreactivity was significantly associated with poorer cancer specific survival and recurrence (P=0.0452; P=0.022 respectively). CONCLUSION: Combining selective sorting technology and next generation sequencing allowed to better define EAC inter- and intra-tumor heterogeneity. We identified HNF1A as a new mutated gene associated to EAC that could be involved in tumor progression and promising biomarkers such as SMAD4, miR221 and miR483-3p to identify patients at higher risk for more aggressive tumors.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

Information is nowadays a key resource: machine learning and data mining techniques have been developed to extract high-level information from great amounts of data. As most data comes in form of unstructured text in natural languages, research on text mining is currently very active and dealing with practical problems. Among these, text categorization deals with the automatic organization of large quantities of documents in priorly defined taxonomies of topic categories, possibly arranged in large hierarchies. In commonly proposed machine learning approaches, classifiers are automatically trained from pre-labeled documents: they can perform very accurate classification, but often require a consistent training set and notable computational effort. Methods for cross-domain text categorization have been proposed, allowing to leverage a set of labeled documents of one domain to classify those of another one. Most methods use advanced statistical techniques, usually involving tuning of parameters. A first contribution presented here is a method based on nearest centroid classification, where profiles of categories are generated from the known domain and then iteratively adapted to the unknown one. Despite being conceptually simple and having easily tuned parameters, this method achieves state-of-the-art accuracy in most benchmark datasets with fast running times. A second, deeper contribution involves the design of a domain-independent model to distinguish the degree and type of relatedness between arbitrary documents and topics, inferred from the different types of semantic relationships between respective representative words, identified by specific search algorithms. The application of this model is tested on both flat and hierarchical text categorization, where it potentially allows the efficient addition of new categories during classification. Results show that classification accuracy still requires improvements, but models generated from one domain are shown to be effectively able to be reused in a different one.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Statistical modelling and statistical learning theory are two powerful analytical frameworks for analyzing signals and developing efficient processing and classification algorithms. In this thesis, these frameworks are applied for modelling and processing biomedical signals in two different contexts: ultrasound medical imaging systems and primate neural activity analysis and modelling. In the context of ultrasound medical imaging, two main applications are explored: deconvolution of signals measured from a ultrasonic transducer and automatic image segmentation and classification of prostate ultrasound scans. In the former application a stochastic model of the radio frequency signal measured from a ultrasonic transducer is derived. This model is then employed for developing in a statistical framework a regularized deconvolution procedure, for enhancing signal resolution. In the latter application, different statistical models are used to characterize images of prostate tissues, extracting different features. These features are then uses to segment the images in region of interests by means of an automatic procedure based on a statistical model of the extracted features. Finally, machine learning techniques are used for automatic classification of the different region of interests. In the context of neural activity signals, an example of bio-inspired dynamical network was developed to help in studies of motor-related processes in the brain of primate monkeys. The presented model aims to mimic the abstract functionality of a cell population in 7a parietal region of primate monkeys, during the execution of learned behavioural tasks.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Machine learning comprises a series of techniques for automatic extraction of meaningful information from large collections of noisy data. In many real world applications, data is naturally represented in structured form. Since traditional methods in machine learning deal with vectorial information, they require an a priori form of preprocessing. Among all the learning techniques for dealing with structured data, kernel methods are recognized to have a strong theoretical background and to be effective approaches. They do not require an explicit vectorial representation of the data in terms of features, but rely on a measure of similarity between any pair of objects of a domain, the kernel function. Designing fast and good kernel functions is a challenging problem. In the case of tree structured data two issues become relevant: kernel for trees should not be sparse and should be fast to compute. The sparsity problem arises when, given a dataset and a kernel function, most structures of the dataset are completely dissimilar to one another. In those cases the classifier has too few information for making correct predictions on unseen data. In fact, it tends to produce a discriminating function behaving as the nearest neighbour rule. Sparsity is likely to arise for some standard tree kernel functions, such as the subtree and subset tree kernel, when they are applied to datasets with node labels belonging to a large domain. A second drawback of using tree kernels is the time complexity required both in learning and classification phases. Such a complexity can sometimes prevents the kernel application in scenarios involving large amount of data. This thesis proposes three contributions for resolving the above issues of kernel for trees. A first contribution aims at creating kernel functions which adapt to the statistical properties of the dataset, thus reducing its sparsity with respect to traditional tree kernel functions. Specifically, we propose to encode the input trees by an algorithm able to project the data onto a lower dimensional space with the property that similar structures are mapped similarly. By building kernel functions on the lower dimensional representation, we are able to perform inexact matchings between different inputs in the original space. A second contribution is the proposal of a novel kernel function based on the convolution kernel framework. Convolution kernel measures the similarity of two objects in terms of the similarities of their subparts. Most convolution kernels are based on counting the number of shared substructures, partially discarding information about their position in the original structure. The kernel function we propose is, instead, especially focused on this aspect. A third contribution is devoted at reducing the computational burden related to the calculation of a kernel function between a tree and a forest of trees, which is a typical operation in the classification phase and, for some algorithms, also in the learning phase. We propose a general methodology applicable to convolution kernels. Moreover, we show an instantiation of our technique when kernels such as the subtree and subset tree kernels are employed. In those cases, Direct Acyclic Graphs can be used to compactly represent shared substructures in different trees, thus reducing the computational burden and storage requirements.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Different types of proteins exist with diverse functions that are essential for living organisms. An important class of proteins is represented by transmembrane proteins which are specifically designed to be inserted into biological membranes and devised to perform very important functions in the cell such as cell communication and active transport across the membrane. Transmembrane β-barrels (TMBBs) are a sub-class of membrane proteins largely under-represented in structure databases because of the extreme difficulty in experimental structure determination. For this reason, computational tools that are able to predict the structure of TMBBs are needed. In this thesis, two computational problems related to TMBBs were addressed: the detection of TMBBs in large datasets of proteins and the prediction of the topology of TMBB proteins. Firstly, a method for TMBB detection was presented based on a novel neural network framework for variable-length sequence classification. The proposed approach was validated on a non-redundant dataset of proteins. Furthermore, we carried-out genome-wide detection using the entire Escherichia coli proteome. In both experiments, the method significantly outperformed other existing state-of-the-art approaches, reaching very high PPV (92%) and MCC (0.82). Secondly, a method was also introduced for TMBB topology prediction. The proposed approach is based on grammatical modelling and probabilistic discriminative models for sequence data labeling. The method was evaluated using a newly generated dataset of 38 TMBB proteins obtained from high-resolution data in the PDB. Results have shown that the model is able to correctly predict topologies of 25 out of 38 protein chains in the dataset. When tested on previously released datasets, the performances of the proposed approach were measured as comparable or superior to the current state-of-the-art of TMBB topology prediction.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The world of Computational Biology and Bioinformatics presently integrates many different expertise, including computer science and electronic engineering. A major aim in Data Science is the development and tuning of specific computational approaches to interpret the complexity of Biology. Molecular biologists and medical doctors heavily rely on an interdisciplinary expert capable of understanding the biological background to apply algorithms for finding optimal solutions to their problems. With this problem-solving orientation, I was involved in two basic research fields: Cancer Genomics and Enzyme Proteomics. For this reason, what I developed and implemented can be considered a general effort to help data analysis both in Cancer Genomics and in Enzyme Proteomics, focusing on enzymes which catalyse all the biochemical reactions in cells. Specifically, as to Cancer Genomics I contributed to the characterization of intratumoral immune microenvironment in gastrointestinal stromal tumours (GISTs) correlating immune cell population levels with tumour subtypes. I was involved in the setup of strategies for the evaluation and standardization of different approaches for fusion transcript detection in sarcomas that can be applied in routine diagnostic. This was part of a coordinated effort of the Sarcoma working group of "Alleanza Contro il Cancro". As to Enzyme Proteomics, I generated a derived database collecting all the human proteins and enzymes which are known to be associated to genetic disease. I curated the data search in freely available databases such as PDB, UniProt, Humsavar, Clinvar and I was responsible of searching, updating, and handling the information content, and computing statistics. I also developed a web server, BENZ, which allows researchers to annotate an enzyme sequence with the corresponding Enzyme Commission number, the important feature fully describing the catalysed reaction. More to this, I greatly contributed to the characterization of the enzyme-genetic disease association, for a better classification of the metabolic genetic diseases.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The dissertation addresses the still not solved challenges concerned with the source-based digital 3D reconstruction, visualisation and documentation in the domain of archaeology, art and architecture history. The emerging BIM methodology and the exchange data format IFC are changing the way of collaboration, visualisation and documentation in the planning, construction and facility management process. The introduction and development of the Semantic Web (Web 3.0), spreading the idea of structured, formalised and linked data, offers semantically enriched human- and machine-readable data. In contrast to civil engineering and cultural heritage, academic object-oriented disciplines, like archaeology, art and architecture history, are acting as outside spectators. Since the 1990s, it has been argued that a 3D model is not likely to be considered a scientific reconstruction unless it is grounded on accurate documentation and visualisation. However, these standards are still missing and the validation of the outcomes is not fulfilled. Meanwhile, the digital research data remain ephemeral and continue to fill the growing digital cemeteries. This study focuses, therefore, on the evaluation of the source-based digital 3D reconstructions and, especially, on uncertainty assessment in the case of hypothetical reconstructions of destroyed or never built artefacts according to scientific principles, making the models shareable and reusable by a potentially wide audience. The work initially focuses on terminology and on the definition of a workflow especially related to the classification and visualisation of uncertainty. The workflow is then applied to specific cases of 3D models uploaded to the DFG repository of the AI Mainz. In this way, the available methods of documenting, visualising and communicating uncertainty are analysed. In the end, this process will lead to a validation or a correction of the workflow and the initial assumptions, but also (dealing with different hypotheses) to a better definition of the levels of uncertainty.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this work, we explore and demonstrate the potential for modeling and classification using quantile-based distributions, which are random variables defined by their quantile function. In the first part we formalize a least squares estimation framework for the class of linear quantile functions, leading to unbiased and asymptotically normal estimators. Among the distributions with a linear quantile function, we focus on the flattened generalized logistic distribution (fgld), which offers a wide range of distributional shapes. A novel naïve-Bayes classifier is proposed that utilizes the fgld estimated via least squares, and through simulations and applications, we demonstrate its competitiveness against state-of-the-art alternatives. In the second part we consider the Bayesian estimation of quantile-based distributions. We introduce a factor model with independent latent variables, which are distributed according to the fgld. Similar to the independent factor analysis model, this approach accommodates flexible factor distributions while using fewer parameters. The model is presented within a Bayesian framework, an MCMC algorithm for its estimation is developed, and its effectiveness is illustrated with data coming from the European Social Survey. The third part focuses on depth functions, which extend the concept of quantiles to multivariate data by imposing a center-outward ordering in the multivariate space. We investigate the recently introduced integrated rank-weighted (IRW) depth function, which is based on the distribution of random spherical projections of the multivariate data. This depth function proves to be computationally efficient and to increase its flexibility we propose different methods to explicitly model the projected univariate distributions. Its usefulness is shown in classification tasks: the maximum depth classifier based on the IRW depth is proven to be asymptotically optimal under certain conditions, and classifiers based on the IRW depth are shown to perform well in simulated and real data experiments.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The abundance of visual data and the push for robust AI are driving the need for automated visual sensemaking. Computer Vision (CV) faces growing demand for models that can discern not only what images "represent," but also what they "evoke." This is a demand for tools mimicking human perception at a high semantic level, categorizing images based on concepts like freedom, danger, or safety. However, automating this process is challenging due to entropy, scarcity, subjectivity, and ethical considerations. These challenges not only impact performance but also underscore the critical need for interoperability. This dissertation focuses on abstract concept-based (AC) image classification, guided by three technical principles: situated grounding, performance enhancement, and interpretability. We introduce ART-stract, a novel dataset of cultural images annotated with ACs, serving as the foundation for a series of experiments across four key domains: assessing the effectiveness of the end-to-end DL paradigm, exploring cognitive-inspired semantic intermediaries, incorporating cultural and commonsense aspects, and neuro-symbolic integration of sensory-perceptual data with cognitive-based knowledge. Our results demonstrate that integrating CV approaches with semantic technologies yields methods that surpass the current state of the art in AC image classification, outperforming the end-to-end deep vision paradigm. The results emphasize the role semantic technologies can play in developing both effective and interpretable systems, through the capturing, situating, and reasoning over knowledge related to visual data. Furthermore, this dissertation explores the complex interplay between technical and socio-technical factors. By merging technical expertise with an understanding of human and societal aspects, we advocate for responsible labeling and training practices in visual media. These insights and techniques not only advance efforts in CV and explainable artificial intelligence but also propel us toward an era of AI development that harmonizes technical prowess with deep awareness of its human and societal implications.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Introduction: Recently, the American Association of Gynecologic Laparoscopists proposed a new classification and scoring system with the specific aim to assess surgical complexity. This study sought to assess if a higher AAGL score correlates with an increased risk of peri-operative complications in women submitted to surgery for endometriosis. Methods: This is a retrospective cohort study conducted in a third level referral center. We collected data from women with endometriosis submitted to complete surgical removal of endometriosis from January 2019 to December 2021. ENZIAN, r-ASRM classifications and AAGL total score was calculated for each patient. Population was divided in two groups according to the occurrence or not of at least one peri-operative complication. Our primary outcome was to evaluate the correlation between AAGL score and occurrence of complications. Results: During the study period we analyzed data from 282 eligible patients. Among them, 80 (28.4%) experienced peri-operative complications. No statistically significant difference was found between the two groups in terms of baseline characteristics, except for pre-operative hemoglobin (Hb), which was lower in patients with complications (p=0.001). Surgical variables associated with the occurrence of complications were recto-sigmoid surgery (p=0.003), ileocecal resection (0.034), and longer operative time (p=0.007). Furthermore, a higher ENZIAN B score (p=0.006), AAGL score (p=0.045) and stage (p=0.022) were found in the group of patients with complications. The multivariate analysis only confirmed the significant association between the occurrence of peri-operative complications and lower pre-operative Hb level (OR 0.74; 95% CI, 0.59 - 0.94; p=0.014), longer operative time (OR 1.00; 95% CI, 1.00 – 1.01; p=0.013), recto-sigmoid surgery - especially discoid resection (OR 8.73; 95% CI, 2.18 – 35; p=0.016) and ENZIAN B3 (OR 3.62; 95% CI, 1.46 – 8.99; p= 0.006). Conclusion: According to our findings, high AAGL scores or stages do not seem to increase the risk of peri-operative complications.