4 resultados para capability data
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
The main aim of this Ph.D. dissertation is the study of clustering dependent data by means of copula functions with particular emphasis on microarray data. Copula functions are a popular multivariate modeling tool in each field where the multivariate dependence is of great interest and their use in clustering has not been still investigated. The first part of this work contains the review of the literature of clustering methods, copula functions and microarray experiments. The attention focuses on the K–means (Hartigan, 1975; Hartigan and Wong, 1979), the hierarchical (Everitt, 1974) and the model–based (Fraley and Raftery, 1998, 1999, 2000, 2007) clustering techniques because their performance is compared. Then, the probabilistic interpretation of the Sklar’s theorem (Sklar’s, 1959), the estimation methods for copulas like the Inference for Margins (Joe and Xu, 1996) and the Archimedean and Elliptical copula families are presented. In the end, applications of clustering methods and copulas to the genetic and microarray experiments are highlighted. The second part contains the original contribution proposed. A simulation study is performed in order to evaluate the performance of the K–means and the hierarchical bottom–up clustering methods in identifying clusters according to the dependence structure of the data generating process. Different simulations are performed by varying different conditions (e.g., the kind of margins (distinct, overlapping and nested) and the value of the dependence parameter ) and the results are evaluated by means of different measures of performance. In light of the simulation results and of the limits of the two investigated clustering methods, a new clustering algorithm based on copula functions (‘CoClust’ in brief) is proposed. The basic idea, the iterative procedure of the CoClust and the description of the written R functions with their output are given. The CoClust algorithm is tested on simulated data (by varying the number of clusters, the copula models, the dependence parameter value and the degree of overlap of margins) and is compared with the performance of model–based clustering by using different measures of performance, like the percentage of well–identified number of clusters and the not rejection percentage of H0 on . It is shown that the CoClust algorithm allows to overcome all observed limits of the other investigated clustering techniques and is able to identify clusters according to the dependence structure of the data independently of the degree of overlap of margins and the strength of the dependence. The CoClust uses a criterion based on the maximized log–likelihood function of the copula and can virtually account for any possible dependence relationship between observations. Many peculiar characteristics are shown for the CoClust, e.g. its capability of identifying the true number of clusters and the fact that it does not require a starting classification. Finally, the CoClust algorithm is applied to the real microarray data of Hedenfalk et al. (2001) both to the gene expressions observed in three different cancer samples and to the columns (tumor samples) of the whole data matrix.
Resumo:
The present research aims at shedding light on the demanding puzzle characterizing the issue of child undernutrition in India. Indeed, the so called ‘Indian development paradox’ identifies the phenomenon according to which higher level of income per capita is recorded alongside a lethargic reduction in the proportion of underweight children aged below three years. Thus, in the time period occurring from 2000 to 2005, real Gross Domestic Production per capita has annually grown at 5.4%, whereas the proportion of children who are underweight has declined from 47% to 46%, a mere one point percent. Such trend opens up the space for discussing the traditionally assumed linkage between income-poverty and undernutrition as well as food intervention as the main focus of policies designed to fight child hunger. Also, it unlocks doors for evaluating the role of an alternative economic approach aiming at explaining undernutrition, such as the Capability Approach. The Capability Approach argues for widening the informational basis to account not only for resources, but also for variables related to liberties, opportunities and autonomy in pursuing what individuals value.The econometric analysis highlights the relevance of including behavioral factors when explaining child undernutrition. In particular, the ability of the mother to move freely in the community without the need of asking permission to her husband or mother-in-law is statistically significant when included in the model, which accounts also for confounding traditional variables, such as economic wealth and food security. Also, focusing on agency, results indicates the necessity of measuring autonomy in different domains and the need of improving the measurement scale for agency data, especially with regards the domain of household duties. Finally, future research is required to investigate policy venues for increasing agency in women and in the communities they live in as viable strategy for reducing the plague of child undernutrition in India.
Resumo:
The thesis objectives are to develop new methodologies for study of the space and time variability of Italian upper ocean ecosystem through the combined use of multi-sensors satellite data and in situ observations and to identify the capability and limits of remote sensing observations to monitor the marine state at short and long time scales. Three oceanographic basins have been selected and subjected to different types of analyses. The first region is the Tyrrhenian Sea where a comparative analysis of altimetry and lagrangian measurements was carried out to study the surface circulation. The results allowed to deepen the knowledge of the Tyrrhenian Sea surface dynamics and its variability and to defined the limitations of satellite altimetry measurements to detect small scale marine circulation features. Channel of Sicily study aimed to identify the spatial-temporal variability of phytoplankton biomass and to understand the impact of the upper ocean circulation on the marine ecosystem. An combined analysis of the satellite of long term time series of chlorophyll, Sea Surface Temperature and Sea Level field data was applied. The results allowed to identify the key role of the Atlantic water inflow in modulating the seasonal variability of the phytoplankton biomass in the region. Finally, Italian coastal marine system was studied with the objective to explore the potential capability of Ocean Color data in detecting chlorophyll trend in coastal areas. The most appropriated methodology to detect long term environmental changes was defined through intercomparison of chlorophyll trends detected by in situ and satellite. Then, Italian coastal areas subject to eutrophication problems were identified. This work has demonstrated that satellites data constitute an unique opportunity to define the features and forcing influencing the upper ocean ecosystems dynamics and can be used also to monitor environmental variables capable of influencing phytoplankton productivity.
Resumo:
In the era of the Internet of Everything, a user with a handheld or wearable device equipped with sensing capability has become a producer as well as a consumer of information and services. The more powerful these devices get, the more likely it is that they will generate and share content locally, leading to the presence of distributed information sources and the diminishing role of centralized servers. As of current practice, we rely on infrastructure acting as an intermediary, providing access to the data. However, infrastructure-based connectivity might not always be available or the best alternative. Moreover, it is often the case where the data and the processes acting upon them are of local scopus. Answers to a query about a nearby object, an information source, a process, an experience, an ability, etc. could be answered locally without reliance on infrastructure-based platforms. The data might have temporal validity limited to or bounded to a geographical area and/or the social context where the user is immersed in. In this envisioned scenario users could interact locally without the need for a central authority, hence, the claim of an infrastructure-less, provider-less platform. The data is owned by the users and consulted locally as opposed to the current approach of making them available globally and stay on forever. From a technical viewpoint, this network resembles a Delay/Disruption Tolerant Network where consumers and producers might be spatially and temporally decoupled exchanging information with each other in an adhoc fashion. To this end, we propose some novel data gathering and dissemination strategies for use in urban-wide environments which do not rely on strict infrastructure mediation. While preserving the general aspects of our study and without loss of generality, we focus our attention toward practical applicative scenarios which help us capture the characteristics of opportunistic communication networks.