11 resultados para Gender classification model

em AMS Tesi di Dottorato - Alm@DL - Università di Bologna


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Hematological cancers are a heterogeneous family of diseases that can be divided into leukemias, lymphomas, and myelomas, often called “liquid tumors”. Since they cannot be surgically removable, chemotherapy represents the mainstay of their treatment. However, it still faces several challenges like drug resistance and low response rate, and the need for new anticancer agents is compelling. The drug discovery process is long-term, costly, and prone to high failure rates. With the rapid expansion of biological and chemical "big data", some computational techniques such as machine learning tools have been increasingly employed to speed up and economize the whole process. Machine learning algorithms can create complex models with the aim to determine the biological activity of compounds against several targets, based on their chemical properties. These models are defined as multi-target Quantitative Structure-Activity Relationship (mt-QSAR) and can be used to virtually screen small and large chemical libraries for the identification of new molecules with anticancer activity. The aim of my Ph.D. project was to employ machine learning techniques to build an mt-QSAR classification model for the prediction of cytotoxic drugs simultaneously active against 43 hematological cancer cell lines. For this purpose, first, I constructed a large and diversified dataset of molecules extracted from the ChEMBL database. Then, I compared the performance of different ML classification algorithms, until Random Forest was identified as the one returning the best predictions. Finally, I used different approaches to maximize the performance of the model, which achieved an accuracy of 88% by correctly classifying 93% of inactive molecules and 72% of active molecules in a validation set. This model was further applied to the virtual screening of a small dataset of molecules tested in our laboratory, where it showed 100% accuracy in correctly classifying all molecules. This result is confirmed by our previous in vitro experiments.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aim of my dissertation is to study the gender wage gap with a specific focus on developing and transition countries. In the first chapter I present the main existing theories proposed to analyse the gender wage gap and I review the empirical literature on the gender wage gap in developing and transition countries and its main findings. Then, I discuss the overall empirical issues related to the estimation of the gender wage gap and the issues specific to developing and transition countries. The second chapter is an empirical analysis of the gender wage gap in a developing countries, the Union of Comoros, using data from the multidimensional household budget survey “Enquete integrale auprès des ménages” (EIM) run in 2004. The interest of my work is to provide a benchmark analysis for further studies on the situation of women in the Comorian labour market and to contribute to the literature on gender wage gap in Africa by making available more information on the dynamics and mechanism of the gender wage gap, given the limited interest on the topic in this area of the world. The third chapter is an applied analysis of the gender wage gap in a transition country, Poland, using data from the Labour Force Survey (LSF) collected for the years 1994 and 2004. I provide a detailed examination of how gender earning differentials have changed over the period starting from 1994 to a more advanced transition phase in 2004, when market elements have become much more important in the functioning of the Polish economy than in the earlier phase. The main contribution of my dissertation is the application of the econometrical methodology that I describe in the beginning of the second chapter. First, I run a preliminary OLS and quantile regression analysis to estimate and describe the raw and conditional wage gaps along the distribution. Second, I estimate quantile regressions separately for males and females, in order to allow for different rewards to characteristics. Third, I proceed to decompose the raw wage gap estimated at the mean through the Oaxaca-Blinder (1973) procedure. In the second chapter I run a two-steps Heckman procedure by estimating a model of participation in the labour market which shows a significant selection bias for females. Forth, I apply the Machado-Mata (2005) techniques to extend the decomposition analysis at all points of the distribution. In Poland I can also implement the Juhn, Murphy and Pierce (1991) decomposition over the period 1994-2004, to account for effects to the pay gap due to changes in overall wage dispersion beyond Oaxaca’s standard decomposition.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The work presented in this thesis is focused on the open-ended coaxial-probe frequency-domain reflectometry technique for complex permittivity measurement at microwave frequencies of dispersive dielectric multilayer materials. An effective dielectric model is introduced and validated to extend the applicability of this technique to multilayer materials in on-line system context. In addition, the thesis presents: 1) a numerical study regarding the imperfectness of the contact at the probe-material interface, 2) a review of the available models and techniques, 3) a new classification of the extraction schemes with guidelines on how they can be used to improve the overall performance of the probe according to the problem requirements.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The diagnosis, grading and classification of tumours has benefited considerably from the development of DCE-MRI which is now essential to the adequate clinical management of many tumour types due to its capability in detecting active angiogenesis. Several strategies have been proposed for DCE-MRI evaluation. Visual inspection of contrast agent concentration curves vs time is a very simple yet operator dependent procedure, therefore more objective approaches have been developed in order to facilitate comparison between studies. In so called model free approaches, descriptive or heuristic information extracted from time series raw data have been used for tissue classification. The main issue concerning these schemes is that they have not a direct interpretation in terms of physiological properties of the tissues. On the other hand, model based investigations typically involve compartmental tracer kinetic modelling and pixel-by-pixel estimation of kinetic parameters via non-linear regression applied on region of interests opportunely selected by the physician. This approach has the advantage to provide parameters directly related to the pathophysiological properties of the tissue such as vessel permeability, local regional blood flow, extraction fraction, concentration gradient between plasma and extravascular-extracellular space. Anyway, nonlinear modelling is computational demanding and the accuracy of the estimates can be affected by the signal-to-noise ratio and by the initial solutions. The principal aim of this thesis is investigate the use of semi-quantitative and quantitative parameters for segmentation and classification of breast lesion. The objectives can be subdivided as follow: describe the principal techniques to evaluate time intensity curve in DCE-MRI with focus on kinetic model proposed in literature; to evaluate the influence in parametrization choice for a classic bi-compartmental kinetic models; to evaluate the performance of a method for simultaneous tracer kinetic modelling and pixel classification; to evaluate performance of machine learning techniques training for segmentation and classification of breast lesion.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

During recent decades, economists' interest in gender-related issues has risen. Researchers aim to show how economic theory can be applied to gender related topics such as peer effect, labor market outcomes, and education. This dissertation aims to contribute to our understandings of the interaction, inequality and sources of differences across genders, and it consists of three empirical papers in the research area of gender economics. The aim of the first paper ("Separating gender composition effect from peer effects in education") is to demonstrate the importance of considering endogenous peer effects in order to identify gender composition effect. This fact is analytically illustrated by employing Manski's (1993) linear-in-means model. The paper derives an innovative solution to the simultaneous identification of endogenous and exogenous peer effects: gender composition effect of interest is estimated from auxiliary reduced-form estimates after identifying the endogenous peer effect by using Graham (2008) variance restriction method. The paper applies this methodology to two different data sets from American and Italian schools. The motivation of the second paper ("Gender differences in vulnerability to an economic crisis") is to analyze the different effect of recent economic crisis on the labor market outcome of men and women. Using triple differences method (before-after crisis, harder-milder hit sectors, men-women) the paper used British data at the occupation level and shows that men suffer more than women in terms of probability of losing their job. Several explanations for the findings are proposed. The third paper ("Gender gap in educational outcome") is concerned with a controversial academic debate on the existence, degree and origin of the gender gap in test scores. The existence of a gap both in mean scores and the variability around the mean is documented and analyzed. The origins of the gap are investigated by looking at wide range of possible explanations.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Information is nowadays a key resource: machine learning and data mining techniques have been developed to extract high-level information from great amounts of data. As most data comes in form of unstructured text in natural languages, research on text mining is currently very active and dealing with practical problems. Among these, text categorization deals with the automatic organization of large quantities of documents in priorly defined taxonomies of topic categories, possibly arranged in large hierarchies. In commonly proposed machine learning approaches, classifiers are automatically trained from pre-labeled documents: they can perform very accurate classification, but often require a consistent training set and notable computational effort. Methods for cross-domain text categorization have been proposed, allowing to leverage a set of labeled documents of one domain to classify those of another one. Most methods use advanced statistical techniques, usually involving tuning of parameters. A first contribution presented here is a method based on nearest centroid classification, where profiles of categories are generated from the known domain and then iteratively adapted to the unknown one. Despite being conceptually simple and having easily tuned parameters, this method achieves state-of-the-art accuracy in most benchmark datasets with fast running times. A second, deeper contribution involves the design of a domain-independent model to distinguish the degree and type of relatedness between arbitrary documents and topics, inferred from the different types of semantic relationships between respective representative words, identified by specific search algorithms. The application of this model is tested on both flat and hierarchical text categorization, where it potentially allows the efficient addition of new categories during classification. Results show that classification accuracy still requires improvements, but models generated from one domain are shown to be effectively able to be reused in a different one.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In these last years a great effort has been put in the development of new techniques for automatic object classification, also due to the consequences in many applications such as medical imaging or driverless cars. To this end, several mathematical models have been developed from logistic regression to neural networks. A crucial aspect of these so called classification algorithms is the use of algebraic tools to represent and approximate the input data. In this thesis, we examine two different models for image classification based on a particular tensor decomposition named Tensor-Train (TT) decomposition. The use of tensor approaches preserves the multidimensional structure of the data and the neighboring relations among pixels. Furthermore the Tensor-Train, differently from other tensor decompositions, does not suffer from the curse of dimensionality making it an extremely powerful strategy when dealing with high-dimensional data. It also allows data compression when combined with truncation strategies that reduce memory requirements without spoiling classification performance. The first model we propose is based on a direct decomposition of the database by means of the TT decomposition to find basis vectors used to classify a new object. The second model is a tensor dictionary learning model, based on the TT decomposition where the terms of the decomposition are estimated using a proximal alternating linearized minimization algorithm with a spectral stepsize.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Idiopathic pulmonary fibrosis (IPF) is a chronic progressive disease with no curative pharmacological treatment. Animal models play an essential role in revealing molecular mechanisms involved in the pathogenesis of the disease. Bleomycin (BLM)-induced lung fibrosis is the most widely used and characterized model for anti-fibrotic drugs screening. However, several issues have been reported, such as the identification of an optimal BLM dose and administration scheme as well as gender-specificity. Moreover, the balance between disease resolution, an appropriate time window for therapeutic intervention and animal welfare remains critical aspects yet to be fully elucidated. In this thesis, Micro CT imaging has been used as a tool to identify the ideal BLM dose regimen to induce sustained lung fibrosis in mice as well as to assess the anti-fibrotic effect of Nintedanib (NINT) treatment upon this BLM administration regimen. In order to select the optimal BLM dose scheme, C57bl/6 male mice were treated with BLM via oropharyngeal aspiration (OA), following either double or triple BLM administration. The triple BLM administration resulted in the most promising scheme, able to balance disease resolution, appropriate time-window for therapeutic intervention and animal welfare. The fibrosis progression was longitudinally assessed by micro-CT every 7 days for 5 weeks after BLM administration and 5 animals were sacrificed at each timepoint for the BALF and histological evaluation. The antifibrotic effect of NINT was assessed following different treatment regimens in this model. Herein, we have developed an optimized mouse model of pulmonary fibrosis, enabling three weeks of the therapeutic window to screen putative anti-fibrotic drugs. micro-CT scanning, allowed us to monitor the progression of lung fibrosis and the therapeutical response longitudinally in the same subject, drastically reducing the number of animals involved in the experiment.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In the first chapter, “Political power and the influence of minorities: theory and evidence from Italy”, I analyze the relationship between minority and majority in politics, and how it can influence policy outcomes. I first present a theoretical model describing the possible consequences of an increase in a minority’s political power and show how it can increase difficulties in reaching a compromise on policy outcomes between parties. Furthermore, I empirically test these implications by exploiting the introduction in 2012 of a gender quota in Italian local elections: the increase in female politicians had heterogeneous effects on the level of funding for daycare, based on its differential effects on the share of women councillors. The second chapter, “Marriage patterns and the gender gap in labor force participation: evidence from Italy”, presents evidence highlighting a new possible determinant of the large gender gap in the Italian labor force: endogamy intensity. I argue that endogamy helps preserve social norms stigmatizing working women and reduces the probability of divorce, which disincentivizes women’s participation in the labor force. Endogamy is proxied by the degree of concentration of its surnames’ distribution, and I provide evidence that a more intense custom of endogamy contributed to enlarging gender participation gaps across Italian municipalities in 2001. The third chapter, “Information and quality of politicians: is transparency helping voters?”, studies how voting choices are affected by giving voters more personal information on candidates. I exploit the introduction of the “Spazzacorrotti” law in Italy in 2019, which imposed candidates at local elections to publish their CVs and criminal records before elections. I find no effects on elected candidates’ age, gender, educational level, or ideology. Moreover, I present anecdotal evidence that candidates with a criminal record received fewer votes on average, but only in the case of local media exposing it.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The dissertation addresses the still not solved challenges concerned with the source-based digital 3D reconstruction, visualisation and documentation in the domain of archaeology, art and architecture history. The emerging BIM methodology and the exchange data format IFC are changing the way of collaboration, visualisation and documentation in the planning, construction and facility management process. The introduction and development of the Semantic Web (Web 3.0), spreading the idea of structured, formalised and linked data, offers semantically enriched human- and machine-readable data. In contrast to civil engineering and cultural heritage, academic object-oriented disciplines, like archaeology, art and architecture history, are acting as outside spectators. Since the 1990s, it has been argued that a 3D model is not likely to be considered a scientific reconstruction unless it is grounded on accurate documentation and visualisation. However, these standards are still missing and the validation of the outcomes is not fulfilled. Meanwhile, the digital research data remain ephemeral and continue to fill the growing digital cemeteries. This study focuses, therefore, on the evaluation of the source-based digital 3D reconstructions and, especially, on uncertainty assessment in the case of hypothetical reconstructions of destroyed or never built artefacts according to scientific principles, making the models shareable and reusable by a potentially wide audience. The work initially focuses on terminology and on the definition of a workflow especially related to the classification and visualisation of uncertainty. The workflow is then applied to specific cases of 3D models uploaded to the DFG repository of the AI Mainz. In this way, the available methods of documenting, visualising and communicating uncertainty are analysed. In the end, this process will lead to a validation or a correction of the workflow and the initial assumptions, but also (dealing with different hypotheses) to a better definition of the levels of uncertainty.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this work, we explore and demonstrate the potential for modeling and classification using quantile-based distributions, which are random variables defined by their quantile function. In the first part we formalize a least squares estimation framework for the class of linear quantile functions, leading to unbiased and asymptotically normal estimators. Among the distributions with a linear quantile function, we focus on the flattened generalized logistic distribution (fgld), which offers a wide range of distributional shapes. A novel naïve-Bayes classifier is proposed that utilizes the fgld estimated via least squares, and through simulations and applications, we demonstrate its competitiveness against state-of-the-art alternatives. In the second part we consider the Bayesian estimation of quantile-based distributions. We introduce a factor model with independent latent variables, which are distributed according to the fgld. Similar to the independent factor analysis model, this approach accommodates flexible factor distributions while using fewer parameters. The model is presented within a Bayesian framework, an MCMC algorithm for its estimation is developed, and its effectiveness is illustrated with data coming from the European Social Survey. The third part focuses on depth functions, which extend the concept of quantiles to multivariate data by imposing a center-outward ordering in the multivariate space. We investigate the recently introduced integrated rank-weighted (IRW) depth function, which is based on the distribution of random spherical projections of the multivariate data. This depth function proves to be computationally efficient and to increase its flexibility we propose different methods to explicitly model the projected univariate distributions. Its usefulness is shown in classification tasks: the maximum depth classifier based on the IRW depth is proven to be asymptotically optimal under certain conditions, and classifiers based on the IRW depth are shown to perform well in simulated and real data experiments.