7 resultados para data interpretation
em AMS Tesi di Dottorato - Alm@DL - Università di Bologna
Resumo:
This thesis analyses problems related to the applicability, in business environments, of Process Mining tools and techniques. The first contribution is a presentation of the state of the art of Process Mining and a characterization of companies, in terms of their "process awareness". The work continues identifying circumstance where problems can emerge: data preparation; actual mining; and results interpretation. Other problems are the configuration of parameters by not-expert users and computational complexity. We concentrate on two possible scenarios: "batch" and "on-line" Process Mining. Concerning the batch Process Mining, we first investigated the data preparation problem and we proposed a solution for the identification of the "case-ids" whenever this field is not explicitly indicated. After that, we concentrated on problems at mining time and we propose the generalization of a well-known control-flow discovery algorithm in order to exploit non instantaneous events. The usage of interval-based recording leads to an important improvement of performance. Later on, we report our work on the parameters configuration for not-expert users. We present two approaches to select the "best" parameters configuration: one is completely autonomous; the other requires human interaction to navigate a hierarchy of candidate models. Concerning the data interpretation and results evaluation, we propose two metrics: a model-to-model and a model-to-log. Finally, we present an automatic approach for the extension of a control-flow model with social information, in order to simplify the analysis of these perspectives. The second part of this thesis deals with control-flow discovery algorithms in on-line settings. We propose a formal definition of the problem, and two baseline approaches. The actual mining algorithms proposed are two: the first is the adaptation, to the control-flow discovery problem, of a frequency counting algorithm; the second constitutes a framework of models which can be used for different kinds of streams (stationary versus evolving).
Resumo:
Streptococcus agalactiae, also known as Group B Streptococcus (GBS) is the primary colonizer of the anogenital mucosa of up to 40% of healthy women and an important cause of invasive neonatal infections worldwide. Among the 10 known capsular serotypes, GBS type III accounts for 30-76% of the cases of neonatal meningitis. Biofilms are dense aggregates of surface-adherent microorganisms embedded in an exopolysaccharide matrix. Centers for Disease Control and Prevention estimate that 65% of human bacterial infections involve biofilms (Post et al., 2004). In recent years, the ability of GBS to form biofilm attracted attention for its possible role in fitness and/or virulence. Here, a new in vitro biofilm formation protocol was developed to guarantee more stringent conditions, to better discriminate between strong-, low- and non- biofilm forming strains and reduce ambiguous data interpretation. This protocol was applied to screen the in vitro biofilm formation ability of more than 350 GBS clinical isolates from pregnant women and neonatal infections belonging to different serotype, in relation to media composition and pH. The results showed the enhancement of GBS biofilm formation in acidic condition and identified a subset of isolates belonging to serotypes III and V that forms strong biofilms in these conditions. Interestingly, the best biofilm formers belonged to the serotype III hypervirulent clone ST-17.It was also found that pH 5.0 induces down-regulation of the capsule but that this reduction is not enough by itself to ensure biofilm formation. Moreover, the ability of proteinase K to strongly inhibit biofilm formation and to disaggregate mature biofilms suggested that proteins play an essential role in promoting GBS biofilm formation and contribute to the biofilm structural stability. Finally, a set of proteins potentially expressed during the GBS in vitro biofilm formation were identified by mass spectrometry.
Resumo:
This work is about the role that environment plays in the production of evolutionary significant variations. It starts with an historical introduction about the concept of variation and the role of environment in its production. Then, I show how a lack of attention to these topics may lead to serious mistakes in data interpretation. A statistical re-analysis of published data on the effects of malnutrition on dental eruption, shows that what has been interpreted as an increase in the mean value, is actually linked to increase of variability. In Chapter 3 I present the topic of development as a link between variability and environmental influence, giving a review of the possible mechanisms by which development influences evolutionary dynamics. Chapter 4 is the core chapter of the thesis; I investigated the role of environment in the development of dental morphology. I used dental hypoplasia as a marker of stress, characterizing two groups. Comparing the morphology of upper molars in the two groups, three major results came out: (i) there is a significant effect of environmental stressors on the overall morphology of upper molars; (ii) the developmental response increases morphological variability of the stressed population; (iii) increase of variability is directional: stressed individuals have increased cusps dimensions and number. I also hypothesized the molecular mechanisms that could be responsible of the observed effects. In Chapter 5, I present future perspectives for developing this research. The direction of dental development response is the same direction of the trend in mammalian dental evolution. Since malnutrition triggers the developmental response, and this particular kind of stressor must have been very common in our class evolutionary history, I propose the possibility that environmental stress actively influenced mammals evolution. Moreover, I discuss the possibility of reconsidering the role of natural selection in the evolution of dental morphology.
Resumo:
The main aim of this Ph.D. dissertation is the study of clustering dependent data by means of copula functions with particular emphasis on microarray data. Copula functions are a popular multivariate modeling tool in each field where the multivariate dependence is of great interest and their use in clustering has not been still investigated. The first part of this work contains the review of the literature of clustering methods, copula functions and microarray experiments. The attention focuses on the K–means (Hartigan, 1975; Hartigan and Wong, 1979), the hierarchical (Everitt, 1974) and the model–based (Fraley and Raftery, 1998, 1999, 2000, 2007) clustering techniques because their performance is compared. Then, the probabilistic interpretation of the Sklar’s theorem (Sklar’s, 1959), the estimation methods for copulas like the Inference for Margins (Joe and Xu, 1996) and the Archimedean and Elliptical copula families are presented. In the end, applications of clustering methods and copulas to the genetic and microarray experiments are highlighted. The second part contains the original contribution proposed. A simulation study is performed in order to evaluate the performance of the K–means and the hierarchical bottom–up clustering methods in identifying clusters according to the dependence structure of the data generating process. Different simulations are performed by varying different conditions (e.g., the kind of margins (distinct, overlapping and nested) and the value of the dependence parameter ) and the results are evaluated by means of different measures of performance. In light of the simulation results and of the limits of the two investigated clustering methods, a new clustering algorithm based on copula functions (‘CoClust’ in brief) is proposed. The basic idea, the iterative procedure of the CoClust and the description of the written R functions with their output are given. The CoClust algorithm is tested on simulated data (by varying the number of clusters, the copula models, the dependence parameter value and the degree of overlap of margins) and is compared with the performance of model–based clustering by using different measures of performance, like the percentage of well–identified number of clusters and the not rejection percentage of H0 on . It is shown that the CoClust algorithm allows to overcome all observed limits of the other investigated clustering techniques and is able to identify clusters according to the dependence structure of the data independently of the degree of overlap of margins and the strength of the dependence. The CoClust uses a criterion based on the maximized log–likelihood function of the copula and can virtually account for any possible dependence relationship between observations. Many peculiar characteristics are shown for the CoClust, e.g. its capability of identifying the true number of clusters and the fact that it does not require a starting classification. Finally, the CoClust algorithm is applied to the real microarray data of Hedenfalk et al. (2001) both to the gene expressions observed in three different cancer samples and to the columns (tumor samples) of the whole data matrix.
Resumo:
Osteoarthritis (OA) or degenerative joint disease (DJD) is a pathology which affects the synovial joints and characterised by a focal loss of articular cartilage and subsequent bony reaction of the subcondral and marginal bone. Its etiology is best explained by a multifactorial model including: age, sex, genetic and systemic factors, other predisposing diseases and functional stress. In this study the results of the investigation of a modern identified skeletal collection will be presented. In particular, we will focus on the relationship between the presence of OA at various joints. The joint modifications have been analysed using a new methodology that allows the scoring of different degrees of expression of the features considered. Materials and Methods The sample examined comes from the Sassari identified skeletal collection (part of “Frassetto collections”). The individuals were born between 1828 and 1916 and died between 1918 and 1932. Information about sex and age is known for all the individuals. The occupation is known for 173 males and 125 females. Data concerning the occupation of the individuals indicate a preindustrial and rural society. OA has been diagnosed when eburnation (EB) or loss of morphology (LM) were present, or when at least two of the following: marginal lipping (ML), esostosis (EX) or erosion (ER), were present. For each articular surface affected a “mean score” was calculated, reflecting the “severity” of the alterations. A further “score” was calculated for each joint. In the analysis sexes and age classes were always kept separate. For the statistical analyses non parametric test were used. Results The results show there is an increase of OA with age in all the joints analyzed and in particular around 50 years and 60 years. The shoulder, the hip and the knee are the joints mainly affected with ageing while the ankle is the less affected; the correlation values confirm this result. The lesion which show the major correlation with age is the ML. In our sample males are more frequently and more severely affected by OA than females, particularly at the superior limbs, while hip and knee are similarly affected in the two sexes. Lateralization shows some positive results in particular in the right shoulder of males and in various articular surfaces especially of the superior limb of both males and females; articular surfaces and joints are quite always lateralized to the right. Occupational analyses did not show remarkable results probably because of the homogeneity of the sample; males although performing different activities are quite all employed in stressful works. No highest prevalence of knee and hip OA was found in farm-workers respect to the other males. Discussion and Conclusion In this work we propose a methodology to score the different features, necessary to diagnose OA, that allows the investigation of the severity of joint degeneration. This method is easier than the one proposed by Buikstra and Ubelaker (1994), but in the same time allows a quite detailed recording of the features. Epidemiological results can be interpreted quite simply and they are in accordance with other studies; more difficult is the interpretation of the occupational results because many questions concerning the activities performed by the individuals of the collection during their lifespan cannot be solved. Because of this, caution is suggested in the interpretation of bioarcheological specimens. With this work we hope to contribute to the discussion on the puzzling problem of the etiology of OA. The possibility of studying identified skeletons will add important data to the description of osseous features of OA, enriching the medical documentation, based on different criteria. Even if we are aware that the clinical diagnosis is different from the palaeopathological one we think our work will be useful in clarifying some epidemiological as well as pathological aspects of OA.
Resumo:
The discovery of the Cosmic Microwave Background (CMB) radiation in 1965 is one of the fundamental milestones supporting the Big Bang theory. The CMB is one of the most important source of information in cosmology. The excellent accuracy of the recent CMB data of WMAP and Planck satellites confirmed the validity of the standard cosmological model and set a new challenge for the data analysis processes and their interpretation. In this thesis we deal with several aspects and useful tools of the data analysis. We focus on their optimization in order to have a complete exploitation of the Planck data and contribute to the final published results. The issues investigated are: the change of coordinates of CMB maps using the HEALPix package, the problem of the aliasing effect in the generation of low resolution maps, the comparison of the Angular Power Spectrum (APS) extraction performances of the optimal QML method, implemented in the code called BolPol, and the pseudo-Cl method, implemented in Cromaster. The QML method has been then applied to the Planck data at large angular scales to extract the CMB APS. The same method has been applied also to analyze the TT parity and the Low Variance anomalies in the Planck maps, showing a consistent deviation from the standard cosmological model, the possible origins for this results have been discussed. The Cromaster code instead has been applied to the 408 MHz and 1.42 GHz surveys focusing on the analysis of the APS of selected regions of the synchrotron emission. The new generation of CMB experiments will be dedicated to polarization measurements, for which are necessary high accuracy devices for separating the polarizations. Here a new technology, called Photonic Crystals, is exploited to develop a new polarization splitter device and its performances are compared to the devices used nowadays.
Resumo:
This thesis is a collection of works focused on the topic of Earthquake Early Warning, with a special attention to large magnitude events. The topic is addressed from different points of view and the structure of the thesis reflects the variety of the aspects which have been analyzed. The first part is dedicated to the giant, 2011 Tohoku-Oki earthquake. The main features of the rupture process are first discussed. The earthquake is then used as a case study to test the feasibility Early Warning methodologies for very large events. Limitations of the standard approaches for large events arise in this chapter. The difficulties are related to the real-time magnitude estimate from the first few seconds of recorded signal. An evolutionary strategy for the real-time magnitude estimate is proposed and applied to the single Tohoku-Oki earthquake. In the second part of the thesis a larger number of earthquakes is analyzed, including small, moderate and large events. Starting from the measurement of two Early Warning parameters, the behavior of small and large earthquakes in the initial portion of recorded signals is investigated. The aim is to understand whether small and large earthquakes can be distinguished from the initial stage of their rupture process. A physical model and a plausible interpretation to justify the observations are proposed. The third part of the thesis is focused on practical, real-time approaches for the rapid identification of the potentially damaged zone during a seismic event. Two different approaches for the rapid prediction of the damage area are proposed and tested. The first one is a threshold-based method which uses traditional seismic data. Then an innovative approach using continuous, GPS data is explored. Both strategies improve the prediction of large scale effects of strong earthquakes.