814 resultados para Hierarchical clustering model
Resumo:
This paper incorporates hierarchical structure into the neoclassical theory of the firm. Firms are hierarchical in two respects: the organization of workers in production and the wage structure. The firm’s hierarchy is represented as the sector of a circle, where the radius represents the hierarchy’s height, the width of the sector represents the breadth of the hierarchy at a given height, and the angle of the sector represents span of control for any given supervisor. A perfectly competitive firm then chooses height and width, as well as capital inputs, in order to maximize profit. We analyze the short run and long run impact of changes in scale economies, input substitutability and input and output prices on the firm’s hierarchical structure. We find that the firm unambiguously becomes more hierarchical as the specialization of its workers increases or as its output price increases relative to input prices. The effect of changes in scale economies is contingent on the output price. The model also brings forth an analysis of wage inequality within the firm, which is found to be independent of technological considerations, and only depends on the firm’s wage schedule.
Resumo:
It has been argued that a single two-dimensional visualization plot may not be sufficient to capture all of the interesting aspects of complex data sets, and therefore a hierarchical visualization system is desirable. In this paper we extend an existing locally linear hierarchical visualization system PhiVis ¸iteBishop98a in several directions: bf(1) We allow for em non-linear projection manifolds. The basic building block is the Generative Topographic Mapping. bf(2) We introduce a general formulation of hierarchical probabilistic models consisting of local probabilistic models organized in a hierarchical tree. General training equations are derived, regardless of the position of the model in the tree. bf(3) Using tools from differential geometry we derive expressions for local directional curvatures of the projection manifold. Like PhiVis, our system is statistically principled and is built interactively in a top-down fashion using the EM algorithm. It enables the user to interactively highlight those data in the parent visualization plot which are captured by a child model. We also incorporate into our system a hierarchical, locally selective representation of magnification factors and directional curvatures of the projection manifolds. Such information is important for further refinement of the hierarchical visualization plot, as well as for controlling the amount of regularization imposed on the local models. We demonstrate the principle of the approach on a toy data set and apply our system to two more complex 12- and 19-dimensional data sets.
Resumo:
It has been argued that a single two-dimensional visualization plot may not be sufficient to capture all of the interesting aspects of complex data sets, and therefore a hierarchical visualization system is desirable. In this paper we extend an existing locally linear hierarchical visualization system PhiVis ¸iteBishop98a in several directions: bf(1) We allow for em non-linear projection manifolds. The basic building block is the Generative Topographic Mapping (GTM). bf(2) We introduce a general formulation of hierarchical probabilistic models consisting of local probabilistic models organized in a hierarchical tree. General training equations are derived, regardless of the position of the model in the tree. bf(3) Using tools from differential geometry we derive expressions for local directional curvatures of the projection manifold. Like PhiVis, our system is statistically principled and is built interactively in a top-down fashion using the EM algorithm. It enables the user to interactively highlight those data in the ancestor visualization plots which are captured by a child model. We also incorporate into our system a hierarchical, locally selective representation of magnification factors and directional curvatures of the projection manifolds. Such information is important for further refinement of the hierarchical visualization plot, as well as for controlling the amount of regularization imposed on the local models. We demonstrate the principle of the approach on a toy data set and apply our system to two more complex 12- and 18-dimensional data sets.
Resumo:
An interactive hierarchical Generative Topographic Mapping (HGTM) ¸iteHGTM has been developed to visualise complex data sets. In this paper, we build a more general visualisation system by extending the HGTM visualisation system in 3 directions: bf (1) We generalize HGTM to noise models from the exponential family of distributions. The basic building block is the Latent Trait Model (LTM) developed in ¸iteKabanpami. bf (2) We give the user a choice of initializing the child plots of the current plot in either em interactive, or em automatic mode. In the interactive mode the user interactively selects ``regions of interest'' as in ¸iteHGTM, whereas in the automatic mode an unsupervised minimum message length (MML)-driven construction of a mixture of LTMs is employed. bf (3) We derive general formulas for magnification factors in latent trait models. Magnification factors are a useful tool to improve our understanding of the visualisation plots, since they can highlight the boundaries between data clusters. The unsupervised construction is particularly useful when high-level plots are covered with dense clusters of highly overlapping data projections, making it difficult to use the interactive mode. Such a situation often arises when visualizing large data sets. We illustrate our approach on a toy example and apply our system to three more complex real data sets.
Resumo:
It has been argued that a single two-dimensional visualization plot may not be sufficient to capture all of the interesting aspects of complex data sets, and therefore a hierarchical visualization system is desirable. In this paper we extend an existing locally linear hierarchical visualization system PhiVis (Bishop98a) in several directions: 1. We allow for em non-linear projection manifolds. The basic building block is the Generative Topographic Mapping. 2. We introduce a general formulation of hierarchical probabilistic models consisting of local probabilistic models organized in a hierarchical tree. General training equations are derived, regardless of the position of the model in the tree. 3. Using tools from differential geometry we derive expressions for local directionalcurvatures of the projection manifold. Like PhiVis, our system is statistically principled and is built interactively in a top-down fashion using the EM algorithm. It enables the user to interactively highlight those data in the parent visualization plot which are captured by a child model.We also incorporate into our system a hierarchical, locally selective representation of magnification factors and directional curvatures of the projection manifolds. Such information is important for further refinement of the hierarchical visualization plot, as well as for controlling the amount of regularization imposed on the local models. We demonstrate the principle of the approach on a toy data set andapply our system to two more complex 12- and 19-dimensional data sets.
Resumo:
When object databases arrived on the scene some ten years ago, they provided database capabilities for previously neglected, complex applications, such as CAD, but were burdened with one inherent teething problem, poor performance. Physical database design is one tool that can provide performance improvements and it is the general area of concern for this thesis. Clustering is one fruitful design technique which can provide improvements in performance. However, clustering in object databases has not been explored in depth and so has not been truly exploited. Further, clustering, although a physical concern, can be determined from the logical model. The object model is richer than previous models, notably the relational model, and so it is anticipated that the opportunities with respect to clustering are greater. This thesis provides a thorough analysis of object clustering strategies with a view to highlighting any links between the object logical and physical model and improving performance. This is achieved by considering all possible types of object logical model construct and the implementation of those constructs in terms of theoretical clusterings strategies to produce actual clustering arrangements. This analysis results in a greater understanding of object clustering strategies, aiding designers in the development process and providing some valuable rules of thumb to support the design process.
Resumo:
We investigate the sensitivity of a Markov model with states and transition probabilities obtained from clustering a molecular dynamics trajectory. We have examined a 500 ns molecular dynamics trajectory of the peptide valine-proline-alanine-leucine in explicit water. The sensitivity is quantified by varying the boundaries of the clusters and investigating the resulting variation in transition probabilities and the average transition time between states. In this way, we represent the effect of clustering using different clustering algorithms. It is found that in terms of the investigated quantities, the peptide dynamics described by the Markov model is sensitive to the clustering; in particular, the average transition times are found to vary up to 46%. Moreover, inclusion of nonphysical sparsely populated clusters can lead to serious errors of up to 814%. In the investigation, the time step used in the transition matrix is determined by the minimum time scale on which the system behaves approximately Markovian. This time step is found to be about 100 ps. It is concluded that the description of peptide dynamics with transition matrices should be performed with care, and that using standard clustering algorithms to obtain states and transition probabilities may not always produce reliable results.
Resumo:
Classical studies of area summation measure contrast detection thresholds as a function of grating diameter. Unfortunately, (i) this approach is compromised by retinal inhomogeneity and (ii) it potentially confounds summation of signal with summation of internal noise. The Swiss cheese stimulus of T. S. Meese and R. J. Summers (2007) and the closely related Battenberg stimulus of T. S. Meese (2010) were designed to avoid these problems by keeping target diameter constant and modulating interdigitated checks of first-order carrier contrast within the stimulus region. This approach has revealed a contrast integration process with greater potency than the classical model of spatial probability summation. Here, we used Swiss cheese stimuli to investigate the spatial limits of contrast integration over a range of carrier frequencies (1–16 c/deg) and raised plaid modulator frequencies (0.25–32 cycles/check). Subthreshold summation for interdigitated carrier pairs remained strong (~4 to 6 dB) up to 4 to 8 cycles/check. Our computational analysis of these results implied linear signal combination (following square-law transduction) over either (i) 12 carrier cycles or more or (ii) 1.27 deg or more. Our model has three stages of summation: short-range summation within linear receptive fields, medium-range integration to compute contrast energy for multiple patches of the image, and long-range pooling of the contrast integrators by probability summation. Our analysis legitimizes the inclusion of widespread integration of signal (and noise) within hierarchical image processing models. It also confirms the individual differences in the spatial extent of integration that emerge from our approach.
Resumo:
This thesis applies a hierarchical latent trait model system to a large quantity of data. The motivation for it was lack of viable approaches to analyse High Throughput Screening datasets which maybe include thousands of data points with high dimensions. High Throughput Screening (HTS) is an important tool in the pharmaceutical industry for discovering leads which can be optimised and further developed into candidate drugs. Since the development of new robotic technologies, the ability to test the activities of compounds has considerably increased in recent years. Traditional methods, looking at tables and graphical plots for analysing relationships between measured activities and the structure of compounds, have not been feasible when facing a large HTS dataset. Instead, data visualisation provides a method for analysing such large datasets, especially with high dimensions. So far, a few visualisation techniques for drug design have been developed, but most of them just cope with several properties of compounds at one time. We believe that a latent variable model (LTM) with a non-linear mapping from the latent space to the data space is a preferred choice for visualising a complex high-dimensional data set. As a type of latent variable model, the latent trait model can deal with either continuous data or discrete data, which makes it particularly useful in this domain. In addition, with the aid of differential geometry, we can imagine the distribution of data from magnification factor and curvature plots. Rather than obtaining the useful information just from a single plot, a hierarchical LTM arranges a set of LTMs and their corresponding plots in a tree structure. We model the whole data set with a LTM at the top level, which is broken down into clusters at deeper levels of t.he hierarchy. In this manner, the refined visualisation plots can be displayed in deeper levels and sub-clusters may be found. Hierarchy of LTMs is trained using expectation-maximisation (EM) algorithm to maximise its likelihood with respect to the data sample. Training proceeds interactively in a recursive fashion (top-down). The user subjectively identifies interesting regions on the visualisation plot that they would like to model in a greater detail. At each stage of hierarchical LTM construction, the EM algorithm alternates between the E- and M-step. Another problem that can occur when visualising a large data set is that there may be significant overlaps of data clusters. It is very difficult for the user to judge where centres of regions of interest should be put. We address this problem by employing the minimum message length technique, which can help the user to decide the optimal structure of the model. In this thesis we also demonstrate the applicability of the hierarchy of latent trait models in the field of document data mining.
Resumo:
Hierarchical knowledge structures are frequently used within clinical decision support systems as part of the model for generating intelligent advice. The nodes in the hierarchy inevitably have varying influence on the decisionmaking processes, which needs to be reflected by parameters. If the model has been elicited from human experts, it is not feasible to ask them to estimate the parameters because there will be so many in even moderately-sized structures. This paper describes how the parameters could be obtained from data instead, using only a small number of cases. The original method [1] is applied to a particular web-based clinical decision support system called GRiST, which uses its hierarchical knowledge to quantify the risks associated with mental-health problems. The knowledge was elicited from multidisciplinary mental-health practitioners but the tree has several thousand nodes, all requiring an estimation of their relative influence on the assessment process. The method described in the paper shows how they can be obtained from about 200 cases instead. It greatly reduces the experts’ elicitation tasks and has the potential for being generalised to similar knowledge-engineering domains where relative weightings of node siblings are part of the parameter space.
Resumo:
Molecular dynamics (MD) has been used to identify the relative distribution of dysprosium in the phosphate glass DyAl0.30P3.05O9.62. The MD model has been compared directly with experimental data obtained from neutron diffraction to enable a detailed comparison beyond the total structure factor level. The MD simulation gives Dy ... Dy correlations at 3.80(5) and 6.40(5) angstrom with relative coordination numbers of 0.8(1) and 7.3(5), thus providing evidence of minority rare-earth clustering within these glasses. The nearest neighbour Dy-O peak occurs at 2.30 angstrom with each Dy atom having on average 5.8 nearest neighbour oxygen atoms. The MD simulation is consistent with the phosphate network model based on interlinked PO4 tetrahedra where the addition of network modifiers Dy3+ depolymerizes the phosphate network through the breakage of P-(O)-P bonds whilst leaving the tetrahedral units intact. The role of aluminium within the network has been taken into explicit account, and A1 is found to be predominantly (78 tetrahedrally coordinated. In fact all four A1 bonds are found to be to P (via an oxygen atom) with negligible amounts of Al-O-Dy bonds present. This provides an important insight into the role of Al additives in improving the mechanical properties of these glasses.
Resumo:
The 21-day experimental gingivitis model, an established noninvasive model of inflammation in response to increasing bacterial accumulation in humans, is designed to enable the study of both the induction and resolution of inflammation. Here, we have analyzed gingival crevicular fluid, an oral fluid comprising a serum transudate and tissue exudates, by LC-MS/MS using Fourier transform ion cyclotron resonance mass spectrometry and iTRAQ isobaric mass tags, to establish meta-proteomic profiles of inflammation-induced changes in proteins in healthy young volunteers. Across the course of experimentally induced gingivitis, we identified 16 bacterial and 186 human proteins. Although abundances of the bacterial proteins identified did not vary temporally, Fusobacterium outer membrane proteins were detected. Fusobacterium species have previously been associated with periodontal health or disease. The human proteins identified spanned a wide range of compartments (both extracellular and intracellular) and functions, including serum proteins, proteins displaying antibacterial properties, and proteins with functions associated with cellular transcription, DNA binding, the cytoskeleton, cell adhesion, and cilia. PolySNAP3 clustering software was used in a multilayered analytical approach. Clusters of proteins that associated with changes to the clinical parameters included neuronal and synapse associated proteins.
Resumo:
This dissertation investigates the very important and current problem of modelling human expertise. This is an apparent issue in any computer system emulating human decision making. It is prominent in Clinical Decision Support Systems (CDSS) due to the complexity of the induction process and the vast number of parameters in most cases. Other issues such as human error and missing or incomplete data present further challenges. In this thesis, the Galatean Risk Screening Tool (GRiST) is used as an example of modelling clinical expertise and parameter elicitation. The tool is a mental health clinical record management system with a top layer of decision support capabilities. It is currently being deployed by several NHS mental health trusts across the UK. The aim of the research is to investigate the problem of parameter elicitation by inducing them from real clinical data rather than from the human experts who provided the decision model. The induced parameters provide an insight into both the data relationships and how experts make decisions themselves. The outcomes help further understand human decision making and, in particular, help GRiST provide more accurate emulations of risk judgements. Although the algorithms and methods presented in this dissertation are applied to GRiST, they can be adopted for other human knowledge engineering domains.
Resumo:
In Statnote 9, we described a one-way analysis of variance (ANOVA) ‘random effects’ model in which the objective was to estimate the degree of variation of a particular measurement and to compare different sources of variation in space and time. The illustrative scenario involved the role of computer keyboards in a University communal computer laboratory as a possible source of microbial contamination of the hands. The study estimated the aerobic colony count of ten selected keyboards with samples taken from two keys per keyboard determined at 9am and 5pm. This type of design is often referred to as a ‘nested’ or ‘hierarchical’ design and the ANOVA estimated the degree of variation: (1) between keyboards, (2) between keys within a keyboard, and (3) between sample times within a key. An alternative to this design is a 'fixed effects' model in which the objective is not to measure sources of variation per se but to estimate differences between specific groups or treatments, which are regarded as 'fixed' or discrete effects. This statnote describes two scenarios utilizing this type of analysis: (1) measuring the degree of bacterial contamination on 2p coins collected from three types of business property, viz., a butcher’s shop, a sandwich shop, and a newsagent and (2) the effectiveness of drugs in the treatment of a fungal eye infection.