22 resultados para Probabilistic latent semantic model
em BORIS: Bern Open Repository and Information System - Berna - Suiça
Resumo:
This paper presents a shallow dialogue analysis model, aimed at human-human dialogues in the context of staff or business meetings. Four components of the model are defined, and several machine learning techniques are used to extract features from dialogue transcripts: maximum entropy classifiers for dialogue acts, latent semantic analysis for topic segmentation, or decision tree classifiers for discourse markers. A rule-based approach is proposed for solving cross-modal references to meeting documents. The methods are trained and evaluated thanks to a common data set and annotation format. The integration of the components into an automated shallow dialogue parser opens the way to multimodal meeting processing and retrieval applications.
Resumo:
Software visualizations can provide a concise overview of a complex software system. Unfortunately, as software has no physical shape, there is no `natural' mapping of software to a two-dimensional space. As a consequence most visualizations tend to use a layout in which position and distance have no meaning, and consequently layout typically diverges from one visualization to another. We propose an approach to consistent layout for software visualization, called Software Cartography, in which the position of a software artifact reflects its vocabulary, and distance corresponds to similarity of vocabulary. We use Latent Semantic Indexing (LSI) to map software artifacts to a vector space, and then use Multidimensional Scaling (MDS) to map this vector space down to two dimensions. The resulting consistent layout allows us to develop a variety of thematic software maps that express very different aspects of software while making it easy to compare them. The approach is especially suitable for comparing views of evolving software, as the vocabulary of software artifacts tends to be stable over time. We present a prototype implementation of Software Cartography, and illustrate its use with practical examples from numerous open-source case studies.
Resumo:
Software visualizations can provide a concise overview of a complex software system. Unfortunately, since software has no physical shape, there is no “natural“ mapping of software to a two-dimensional space. As a consequence most visualizations tend to use a layout in which position and distance have no meaning, and consequently layout typical diverges from one visualization to another. We propose a consistent layout for software maps in which the position of a software artifact reflects its \emph{vocabulary}, and distance corresponds to similarity of vocabulary. We use Latent Semantic Indexing (LSI) to map software artifacts to a vector space, and then use Multidimensional Scaling (MDS) to map this vector space down to two dimensions. The resulting consistent layout allows us to develop a variety of thematic software maps that express very different aspects of software while making it easy to compare them. The approach is especially suitable for comparing views of evolving software, since the vocabulary of software artifacts tends to be stable over time.
Resumo:
Cloud Computing enables provisioning and distribution of highly scalable services in a reliable, on-demand and sustainable manner. However, objectives of managing enterprise distributed applications in cloud environments under Service Level Agreement (SLA) constraints lead to challenges for maintaining optimal resource control. Furthermore, conflicting objectives in management of cloud infrastructure and distributed applications might lead to violations of SLAs and inefficient use of hardware and software resources. This dissertation focusses on how SLAs can be used as an input to the cloud management system, increasing the efficiency of allocating resources, as well as that of infrastructure scaling. First, we present an extended SLA semantic model for modelling complex service-dependencies in distributed applications, and for enabling automated cloud infrastructure management operations. Second, we describe a multi-objective VM allocation algorithm for optimised resource allocation in infrastructure clouds. Third, we describe a method of discovering relations between the performance indicators of services belonging to distributed applications and then using these relations for building scaling rules that a CMS can use for automated management of VMs. Fourth, we introduce two novel VM-scaling algorithms, which optimally scale systems composed of VMs, based on given SLA performance constraints. All presented research works were implemented and tested using enterprise distributed applications.
Resumo:
Recurrent wheezing or asthma is a common problem in children that has increased considerably in prevalence in the past few decades. The causes and underlying mechanisms are poorly understood and it is thought that a numb er of distinct diseases causing similar symptoms are involved. Due to the lack of a biologically founded classification system, children are classified according to their observed disease related features (symptoms, signs, measurements) into phenotypes. The objectives of this PhD project were a) to develop tools for analysing phenotypic variation of a disease, and b) to examine phenotypic variability of wheezing among children by applying these tools to existing epidemiological data. A combination of graphical methods (multivariate co rrespondence analysis) and statistical models (latent variables models) was used. In a first phase, a model for discrete variability (latent class model) was applied to data on symptoms and measurements from an epidemiological study to identify distinct phenotypes of wheezing. In a second phase, the modelling framework was expanded to include continuous variability (e.g. along a severity gradient) and combinations of discrete and continuo us variability (factor models and factor mixture models). The third phase focused on validating the methods using simulation studies. The main body of this thesis consists of 5 articles (3 published, 1 submitted and 1 to be submitted) including applications, methodological contributions and a review. The main findings and contributions were: 1) The application of a latent class model to epidemiological data (symptoms and physiological measurements) yielded plausible pheno types of wheezing with distinguishing characteristics that have previously been used as phenotype defining characteristics. 2) A method was proposed for including responses to conditional questions (e.g. questions on severity or triggers of wheezing are asked only to children with wheeze) in multivariate modelling.ii 3) A panel of clinicians was set up to agree on a plausible model for wheezing diseases. The model can be used to generate datasets for testing the modelling approach. 4) A critical review of methods for defining and validating phenotypes of wheeze in children was conducted. 5) The simulation studies showed that a parsimonious parameterisation of the models is required to identify the true underlying structure of the data. The developed approach can deal with some challenges of real-life cohort data such as variables of mixed mode (continuous and categorical), missing data and conditional questions. If carefully applied, the approach can be used to identify whether the underlying phenotypic variation is discrete (classes), continuous (factors) or a combination of these. These methods could help improve precision of research into causes and mechanisms and contribute to the development of a new classification of wheezing disorders in children and other diseases which are difficult to classify.
Resumo:
The optical quality of the human eye mainly depends on the refractive performance of the cornea. The shape of the cornea is a mechanical balance between intraocular pressure and tissue intrinsic stiffness. Several surgical procedures in ophthalmology alter the biomechanics of the cornea to provoke local or global curvature changes for vision correction. Legitimated by the large number of surgical interventions performed every day, the demand for a deeper understanding of corneal biomechanics is rising to improve the safety of procedures and medical devices. The aim of our work is to propose a numerical model of corneal biomechanics, based on the stromal microstructure. Our novel anisotropic constitutive material law features a probabilistic weighting approach to model collagen fiber distribution as observed on human cornea by Xray scattering analysis (Aghamohammadzadeh et. al., Structure, February 2004). Furthermore, collagen cross-linking was explicitly included in the strain energy function. Results showed that the proposed model is able to successfully reproduce both inflation and extensiometry experimental data (Elsheikh et. al., Curr Eye Res, 2007; Elsheikh et. al., Exp Eye Res, May 2008). In addition, the mechanical properties calculated for patients of different age groups (Group A: 65-79 years; Group B: 80-95 years) demonstrate an increased collagen cross-linking, and a decrease in collagen fiber elasticity from younger to older specimen. These findings correspond to what is known about maturing fibrous biological tissue. Since the presented model can handle different loading situations and includes the anisotropic distribution of collagen fibers, it has the potential to simulate clinical procedures involving nonsymmetrical tissue interventions. In the future, such mechanical model can be used to improve surgical planning and the design of next generation ophthalmic devices.
Resumo:
A cascading failure is a failure in a system of interconnected parts, in which the breakdown of one element can lead to the subsequent collapse of the others. The aim of this paper is to introduce a simple combinatorial model for the study of cascading failures. In particular, having in mind particle systems and Markov random fields, we take into consideration a network of interacting urns displaced over a lattice. Every urn is Pólya-like and its reinforcement matrix is not only a function of time (time contagion) but also of the behavior of the neighboring urns (spatial contagion), and of a random component, which can represent either simple fate or the impact of exogenous factors. In this way a non-trivial dependence structure among the urns is built, and it is used to study default avalanches over the lattice. Thanks to its flexibility and its interesting probabilistic properties, the given construction may be used to model different phenomena characterized by cascading failures such as power grids and financial networks.
Resumo:
Airway disease in childhood comprises a heterogeneous group of disorders. Attempts to distinguish different phenotypes have generally considered few disease dimensions. The present study examines phenotypes of childhood wheeze and chronic cough, by fitting a statistical model to data representing multiple disease dimensions. From a population-based, longitudinal cohort study of 1,650 preschool children, 319 with parent-reported wheeze or chronic cough were included. Phenotypes were identified by latent class analysis using data on symptoms, skin-prick tests, lung function and airway responsiveness from two preschool surveys. These phenotypes were then compared with respect to outcome at school age. The model distinguished three phenotypes of wheeze and two phenotypes of chronic cough. Subsequent wheeze, chronic cough and inhaler use at school age differed clearly between the five phenotypes. The wheeze phenotypes shared features with previously described entities and partly reconciled discrepancies between existing sets of phenotype labels. This novel, multidimensional approach has the potential to identify clinically relevant phenotypes, not only in paediatric disorders but also in adult obstructive airway diseases, where phenotype definition is an equally important issue.
Resumo:
Questionnaire data may contain missing values because certain questions do not apply to all respondents. For instance, questions addressing particular attributes of a symptom, such as frequency, triggers or seasonality, are only applicable to those who have experienced the symptom, while for those who have not, responses to these items will be missing. This missing information does not fall into the category 'missing by design', rather the features of interest do not exist and cannot be measured regardless of survey design. Analysis of responses to such conditional items is therefore typically restricted to the subpopulation in which they apply. This article is concerned with joint multivariate modelling of responses to both unconditional and conditional items without restricting the analysis to this subpopulation. Such an approach is of interest when the distributions of both types of responses are thought to be determined by common parameters affecting the whole population. By integrating the conditional item structure into the model, inference can be based both on unconditional data from the entire population and on conditional data from subjects for whom they exist. This approach opens new possibilities for multivariate analysis of such data. We apply this approach to latent class modelling and provide an example using data on respiratory symptoms (wheeze and cough) in children. Conditional data structures such as that considered here are common in medical research settings and, although our focus is on latent class models, the approach can be applied to other multivariate models.