8 resultados para Classification model stakeholders

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Hematological cancers are a heterogeneous family of diseases that can be divided into leukemias, lymphomas, and myelomas, often called “liquid tumors”. Since they cannot be surgically removable, chemotherapy represents the mainstay of their treatment. However, it still faces several challenges like drug resistance and low response rate, and the need for new anticancer agents is compelling. The drug discovery process is long-term, costly, and prone to high failure rates. With the rapid expansion of biological and chemical "big data", some computational techniques such as machine learning tools have been increasingly employed to speed up and economize the whole process. Machine learning algorithms can create complex models with the aim to determine the biological activity of compounds against several targets, based on their chemical properties. These models are defined as multi-target Quantitative Structure-Activity Relationship (mt-QSAR) and can be used to virtually screen small and large chemical libraries for the identification of new molecules with anticancer activity. The aim of my Ph.D. project was to employ machine learning techniques to build an mt-QSAR classification model for the prediction of cytotoxic drugs simultaneously active against 43 hematological cancer cell lines. For this purpose, first, I constructed a large and diversified dataset of molecules extracted from the ChEMBL database. Then, I compared the performance of different ML classification algorithms, until Random Forest was identified as the one returning the best predictions. Finally, I used different approaches to maximize the performance of the model, which achieved an accuracy of 88% by correctly classifying 93% of inactive molecules and 72% of active molecules in a validation set. This model was further applied to the virtual screening of a small dataset of molecules tested in our laboratory, where it showed 100% accuracy in correctly classifying all molecules. This result is confirmed by our previous in vitro experiments.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The work presented in this thesis is focused on the open-ended coaxial-probe frequency-domain reflectometry technique for complex permittivity measurement at microwave frequencies of dispersive dielectric multilayer materials. An effective dielectric model is introduced and validated to extend the applicability of this technique to multilayer materials in on-line system context. In addition, the thesis presents: 1) a numerical study regarding the imperfectness of the contact at the probe-material interface, 2) a review of the available models and techniques, 3) a new classification of the extraction schemes with guidelines on how they can be used to improve the overall performance of the probe according to the problem requirements.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The diagnosis, grading and classification of tumours has benefited considerably from the development of DCE-MRI which is now essential to the adequate clinical management of many tumour types due to its capability in detecting active angiogenesis. Several strategies have been proposed for DCE-MRI evaluation. Visual inspection of contrast agent concentration curves vs time is a very simple yet operator dependent procedure, therefore more objective approaches have been developed in order to facilitate comparison between studies. In so called model free approaches, descriptive or heuristic information extracted from time series raw data have been used for tissue classification. The main issue concerning these schemes is that they have not a direct interpretation in terms of physiological properties of the tissues. On the other hand, model based investigations typically involve compartmental tracer kinetic modelling and pixel-by-pixel estimation of kinetic parameters via non-linear regression applied on region of interests opportunely selected by the physician. This approach has the advantage to provide parameters directly related to the pathophysiological properties of the tissue such as vessel permeability, local regional blood flow, extraction fraction, concentration gradient between plasma and extravascular-extracellular space. Anyway, nonlinear modelling is computational demanding and the accuracy of the estimates can be affected by the signal-to-noise ratio and by the initial solutions. The principal aim of this thesis is investigate the use of semi-quantitative and quantitative parameters for segmentation and classification of breast lesion. The objectives can be subdivided as follow: describe the principal techniques to evaluate time intensity curve in DCE-MRI with focus on kinetic model proposed in literature; to evaluate the influence in parametrization choice for a classic bi-compartmental kinetic models; to evaluate the performance of a method for simultaneous tracer kinetic modelling and pixel classification; to evaluate performance of machine learning techniques training for segmentation and classification of breast lesion.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Information is nowadays a key resource: machine learning and data mining techniques have been developed to extract high-level information from great amounts of data. As most data comes in form of unstructured text in natural languages, research on text mining is currently very active and dealing with practical problems. Among these, text categorization deals with the automatic organization of large quantities of documents in priorly defined taxonomies of topic categories, possibly arranged in large hierarchies. In commonly proposed machine learning approaches, classifiers are automatically trained from pre-labeled documents: they can perform very accurate classification, but often require a consistent training set and notable computational effort. Methods for cross-domain text categorization have been proposed, allowing to leverage a set of labeled documents of one domain to classify those of another one. Most methods use advanced statistical techniques, usually involving tuning of parameters. A first contribution presented here is a method based on nearest centroid classification, where profiles of categories are generated from the known domain and then iteratively adapted to the unknown one. Despite being conceptually simple and having easily tuned parameters, this method achieves state-of-the-art accuracy in most benchmark datasets with fast running times. A second, deeper contribution involves the design of a domain-independent model to distinguish the degree and type of relatedness between arbitrary documents and topics, inferred from the different types of semantic relationships between respective representative words, identified by specific search algorithms. The application of this model is tested on both flat and hierarchical text categorization, where it potentially allows the efficient addition of new categories during classification. Results show that classification accuracy still requires improvements, but models generated from one domain are shown to be effectively able to be reused in a different one.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In these last years a great effort has been put in the development of new techniques for automatic object classification, also due to the consequences in many applications such as medical imaging or driverless cars. To this end, several mathematical models have been developed from logistic regression to neural networks. A crucial aspect of these so called classification algorithms is the use of algebraic tools to represent and approximate the input data. In this thesis, we examine two different models for image classification based on a particular tensor decomposition named Tensor-Train (TT) decomposition. The use of tensor approaches preserves the multidimensional structure of the data and the neighboring relations among pixels. Furthermore the Tensor-Train, differently from other tensor decompositions, does not suffer from the curse of dimensionality making it an extremely powerful strategy when dealing with high-dimensional data. It also allows data compression when combined with truncation strategies that reduce memory requirements without spoiling classification performance. The first model we propose is based on a direct decomposition of the database by means of the TT decomposition to find basis vectors used to classify a new object. The second model is a tensor dictionary learning model, based on the TT decomposition where the terms of the decomposition are estimated using a proximal alternating linearized minimization algorithm with a spectral stepsize.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The fast development of Information Communication Technologies (ICT) offers new opportunities to realize future smart cities. To understand, manage and forecast the city's behavior, it is necessary the analysis of different kinds of data from the most varied dataset acquisition systems. The aim of this research activity in the framework of Data Science and Complex Systems Physics is to provide stakeholders with new knowledge tools to improve the sustainability of mobility demand in future cities. Under this perspective, the governance of mobility demand generated by large tourist flows is becoming a vital issue for the quality of life in Italian cities' historical centers, which will worsen in the next future due to the continuous globalization process. Another critical theme is sustainable mobility, which aims to reduce private transportation means in the cities and improve multimodal mobility. We analyze the statistical properties of urban mobility of Venice, Rimini, and Bologna by using different datasets provided by companies and local authorities. We develop algorithms and tools for cartography extraction, trips reconstruction, multimodality classification, and mobility simulation. We show the existence of characteristic mobility paths and statistical properties depending on transport means and user's kinds. Finally, we use our results to model and simulate the overall behavior of the cars moving in the Emilia Romagna Region and the pedestrians moving in Venice with software able to replicate in silico the demand for mobility and its dynamic.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The dissertation addresses the still not solved challenges concerned with the source-based digital 3D reconstruction, visualisation and documentation in the domain of archaeology, art and architecture history. The emerging BIM methodology and the exchange data format IFC are changing the way of collaboration, visualisation and documentation in the planning, construction and facility management process. The introduction and development of the Semantic Web (Web 3.0), spreading the idea of structured, formalised and linked data, offers semantically enriched human- and machine-readable data. In contrast to civil engineering and cultural heritage, academic object-oriented disciplines, like archaeology, art and architecture history, are acting as outside spectators. Since the 1990s, it has been argued that a 3D model is not likely to be considered a scientific reconstruction unless it is grounded on accurate documentation and visualisation. However, these standards are still missing and the validation of the outcomes is not fulfilled. Meanwhile, the digital research data remain ephemeral and continue to fill the growing digital cemeteries. This study focuses, therefore, on the evaluation of the source-based digital 3D reconstructions and, especially, on uncertainty assessment in the case of hypothetical reconstructions of destroyed or never built artefacts according to scientific principles, making the models shareable and reusable by a potentially wide audience. The work initially focuses on terminology and on the definition of a workflow especially related to the classification and visualisation of uncertainty. The workflow is then applied to specific cases of 3D models uploaded to the DFG repository of the AI Mainz. In this way, the available methods of documenting, visualising and communicating uncertainty are analysed. In the end, this process will lead to a validation or a correction of the workflow and the initial assumptions, but also (dealing with different hypotheses) to a better definition of the levels of uncertainty.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this work, we explore and demonstrate the potential for modeling and classification using quantile-based distributions, which are random variables defined by their quantile function. In the first part we formalize a least squares estimation framework for the class of linear quantile functions, leading to unbiased and asymptotically normal estimators. Among the distributions with a linear quantile function, we focus on the flattened generalized logistic distribution (fgld), which offers a wide range of distributional shapes. A novel naïve-Bayes classifier is proposed that utilizes the fgld estimated via least squares, and through simulations and applications, we demonstrate its competitiveness against state-of-the-art alternatives. In the second part we consider the Bayesian estimation of quantile-based distributions. We introduce a factor model with independent latent variables, which are distributed according to the fgld. Similar to the independent factor analysis model, this approach accommodates flexible factor distributions while using fewer parameters. The model is presented within a Bayesian framework, an MCMC algorithm for its estimation is developed, and its effectiveness is illustrated with data coming from the European Social Survey. The third part focuses on depth functions, which extend the concept of quantiles to multivariate data by imposing a center-outward ordering in the multivariate space. We investigate the recently introduced integrated rank-weighted (IRW) depth function, which is based on the distribution of random spherical projections of the multivariate data. This depth function proves to be computationally efficient and to increase its flexibility we propose different methods to explicitly model the projected univariate distributions. Its usefulness is shown in classification tasks: the maximum depth classifier based on the IRW depth is proven to be asymptotically optimal under certain conditions, and classifiers based on the IRW depth are shown to perform well in simulated and real data experiments.