39 resultados para hierarchical linear model
Resumo:
This thesis applies a hierarchical latent trait model system to a large quantity of data. The motivation for it was lack of viable approaches to analyse High Throughput Screening datasets which maybe include thousands of data points with high dimensions. High Throughput Screening (HTS) is an important tool in the pharmaceutical industry for discovering leads which can be optimised and further developed into candidate drugs. Since the development of new robotic technologies, the ability to test the activities of compounds has considerably increased in recent years. Traditional methods, looking at tables and graphical plots for analysing relationships between measured activities and the structure of compounds, have not been feasible when facing a large HTS dataset. Instead, data visualisation provides a method for analysing such large datasets, especially with high dimensions. So far, a few visualisation techniques for drug design have been developed, but most of them just cope with several properties of compounds at one time. We believe that a latent variable model (LTM) with a non-linear mapping from the latent space to the data space is a preferred choice for visualising a complex high-dimensional data set. As a type of latent variable model, the latent trait model can deal with either continuous data or discrete data, which makes it particularly useful in this domain. In addition, with the aid of differential geometry, we can imagine the distribution of data from magnification factor and curvature plots. Rather than obtaining the useful information just from a single plot, a hierarchical LTM arranges a set of LTMs and their corresponding plots in a tree structure. We model the whole data set with a LTM at the top level, which is broken down into clusters at deeper levels of t.he hierarchy. In this manner, the refined visualisation plots can be displayed in deeper levels and sub-clusters may be found. Hierarchy of LTMs is trained using expectation-maximisation (EM) algorithm to maximise its likelihood with respect to the data sample. Training proceeds interactively in a recursive fashion (top-down). The user subjectively identifies interesting regions on the visualisation plot that they would like to model in a greater detail. At each stage of hierarchical LTM construction, the EM algorithm alternates between the E- and M-step. Another problem that can occur when visualising a large data set is that there may be significant overlaps of data clusters. It is very difficult for the user to judge where centres of regions of interest should be put. We address this problem by employing the minimum message length technique, which can help the user to decide the optimal structure of the model. In this thesis we also demonstrate the applicability of the hierarchy of latent trait models in the field of document data mining.
Resumo:
Visualization has proven to be a powerful and widely-applicable tool the analysis and interpretation of data. Most visualization algorithms aim to find a projection from the data space down to a two-dimensional visualization space. However, for complex data sets living in a high-dimensional space it is unlikely that a single two-dimensional projection can reveal all of the interesting structure. We therefore introduce a hierarchical visualization algorithm which allows the complete data set to be visualized at the top level, with clusters and sub-clusters of data points visualized at deeper levels. The algorithm is based on a hierarchical mixture of latent variable models, whose parameters are estimated using the expectation-maximization algorithm. We demonstrate the principle of the approach first on a toy data set, and then apply the algorithm to the visualization of a synthetic data set in 12 dimensions obtained from a simulation of multi-phase flows in oil pipelines and to data in 36 dimensions derived from satellite images.
Resumo:
Hierarchical visualization systems are desirable because a single two-dimensional visualization plot may not be sufficient to capture all of the interesting aspects of complex high-dimensional data sets. We extend an existing locally linear hierarchical visualization system PhiVis [1] in several directions: bf(1) we allow for em non-linear projection manifolds (the basic building block is the Generative Topographic Mapping -- GTM), bf(2) we introduce a general formulation of hierarchical probabilistic models consisting of local probabilistic models organized in a hierarchical tree, bf(3) we describe folding patterns of low-dimensional projection manifold in high-dimensional data space by computing and visualizing the manifold's local directional curvatures. Quantities such as magnification factors [3] and directional curvatures are helpful for understanding the layout of the nonlinear projection manifold in the data space and for further refinement of the hierarchical visualization plot. Like PhiVis, our system is statistically principled and is built interactively in a top-down fashion using the EM algorithm. We demonstrate the visualization system principle of the approach on a complex 12-dimensional data set and mention possible applications in the pharmaceutical industry.
Resumo:
Classical studies of area summation measure contrast detection thresholds as a function of grating diameter. Unfortunately, (i) this approach is compromised by retinal inhomogeneity and (ii) it potentially confounds summation of signal with summation of internal noise. The Swiss cheese stimulus of T. S. Meese and R. J. Summers (2007) and the closely related Battenberg stimulus of T. S. Meese (2010) were designed to avoid these problems by keeping target diameter constant and modulating interdigitated checks of first-order carrier contrast within the stimulus region. This approach has revealed a contrast integration process with greater potency than the classical model of spatial probability summation. Here, we used Swiss cheese stimuli to investigate the spatial limits of contrast integration over a range of carrier frequencies (1–16 c/deg) and raised plaid modulator frequencies (0.25–32 cycles/check). Subthreshold summation for interdigitated carrier pairs remained strong (~4 to 6 dB) up to 4 to 8 cycles/check. Our computational analysis of these results implied linear signal combination (following square-law transduction) over either (i) 12 carrier cycles or more or (ii) 1.27 deg or more. Our model has three stages of summation: short-range summation within linear receptive fields, medium-range integration to compute contrast energy for multiple patches of the image, and long-range pooling of the contrast integrators by probability summation. Our analysis legitimizes the inclusion of widespread integration of signal (and noise) within hierarchical image processing models. It also confirms the individual differences in the spatial extent of integration that emerge from our approach.
Resumo:
Linear programming (LP) is the most widely used optimization technique for solving real-life problems because of its simplicity and efficiency. Although conventional LP models require precise data, managers and decision makers dealing with real-world optimization problems often do not have access to exact values. Fuzzy sets have been used in the fuzzy LP (FLP) problems to deal with the imprecise data in the decision variables, objective function and/or the constraints. The imprecisions in the FLP problems could be related to (1) the decision variables; (2) the coefficients of the decision variables in the objective function; (3) the coefficients of the decision variables in the constraints; (4) the right-hand-side of the constraints; or (5) all of these parameters. In this paper, we develop a new stepwise FLP model where fuzzy numbers are considered for the coefficients of the decision variables in the objective function, the coefficients of the decision variables in the constraints and the right-hand-side of the constraints. In the first step, we use the possibility and necessity relations for fuzzy constraints without considering the fuzzy objective function. In the subsequent step, we extend our method to the fuzzy objective function. We use two numerical examples from the FLP literature for comparison purposes and to demonstrate the applicability of the proposed method and the computational efficiency of the procedures and algorithms. © 2013-IOS Press and the authors. All rights reserved.
Resumo:
Purpose – The purpose of this research is to develop a holistic approach to maximize the customer service level while minimizing the logistics cost by using an integrated multiple criteria decision making (MCDM) method for the contemporary transshipment problem. Unlike the prevalent optimization techniques, this paper proposes an integrated approach which considers both quantitative and qualitative factors in order to maximize the benefits of service deliverers and customers under uncertain environments. Design/methodology/approach – This paper proposes a fuzzy-based integer linear programming model, based on the existing literature and validated with an example case. The model integrates the developed fuzzy modification of the analytic hierarchy process (FAHP), and solves the multi-criteria transshipment problem. Findings – This paper provides several novel insights about how to transform a company from a cost-based model to a service-dominated model by using an integrated MCDM method. It suggests that the contemporary customer-driven supply chain remains and increases its competitiveness from two aspects: optimizing the cost and providing the best service simultaneously. Research limitations/implications – This research used one illustrative industry case to exemplify the developed method. Considering the generalization of the research findings and the complexity of the transshipment service network, more cases across multiple industries are necessary to further enhance the validity of the research output. Practical implications – The paper includes implications for the evaluation and selection of transshipment service suppliers, the construction of optimal transshipment network as well as managing the network. Originality/value – The major advantages of this generic approach are that both quantitative and qualitative factors under fuzzy environment are considered simultaneously and also the viewpoints of service deliverers and customers are focused. Therefore, it is believed that it is useful and applicable for the transshipment service network design.
Resumo:
A travelling-wave model of a semiconductor optical amplifier based non-linear loop mirror is developed to investigate the importance of travelling-wave effects and gain/phase dynamics in predicting device behaviour. A constant effective carrier recovery lifetime approximation is found to be reasonably accurate (±10%) within a wide range of control pulse energies. Based on this approximation, a heuristic model is developed for maximum computational efficiency. The models are applied to a particular configuration involving feedback.
Resumo:
Two-stage data envelopment analysis (DEA) efficiency models identify the efficient frontier of a two-stage production process. In some two-stage processes, the inputs to the first stage are shared by the second stage, known as shared inputs. This paper proposes a new relational linear DEA model for dealing with measuring the efficiency score of two-stage processes with shared inputs under constant returns-to-scale assumption. Two case studies of banking industry and university operations are taken as two examples to illustrate the potential applications of the proposed approach.
Resumo:
Using survey data from 358 online customers, the study finds that the e-service quality construct conforms to the structure of a third-order factor model that links online service quality perceptions to distinct and actionable dimensions, including (1) website design, (2) fulfilment, (3) customer service, and (4) security/privacy. Each dimension is found to consist of several attributes that define the basis of e-service quality perceptions. A comprehensive specification of the construct, which includes attributes not covered in existing scales, is developed. The study contrasts a formative model consisting of 4 dimensions and 16 attributes against a reflective conceptualization. The results of this comparison indicate that studies using an incorrectly specified model overestimate the importance of certain e-service quality attributes. Global fit criteria are also found to support the detection of measurement misspecification. Meta-analytic data from 31,264 online customers are used to show that the developed measurement predicts customer behavior better than widely used scales, such as WebQual and E-S-Qual. The results show that the new measurement enables managers to assess e-service quality more accurately and predict customer behavior more reliably.