869 resultados para hierarchical classification system
Resumo:
The data available during the drug discovery process is vast in amount and diverse in nature. To gain useful information from such data, an effective visualisation tool is required. To provide better visualisation facilities to the domain experts (screening scientist, biologist, chemist, etc.),we developed a software which is based on recently developed principled visualisation algorithms such as Generative Topographic Mapping (GTM) and Hierarchical Generative Topographic Mapping (HGTM). The software also supports conventional visualisation techniques such as Principal Component Analysis, NeuroScale, PhiVis, and Locally Linear Embedding (LLE). The software also provides global and local regression facilities . It supports regression algorithms such as Multilayer Perceptron (MLP), Radial Basis Functions network (RBF), Generalised Linear Models (GLM), Mixture of Experts (MoE), and newly developed Guided Mixture of Experts (GME). This user manual gives an overview of the purpose of the software tool, highlights some of the issues to be taken care while creating a new model, and provides information about how to install & use the tool. The user manual does not require the readers to have familiarity with the algorithms it implements. Basic computing skills are enough to operate the software.
Resumo:
An interactive hierarchical Generative Topographic Mapping (HGTM) ¸iteHGTM has been developed to visualise complex data sets. In this paper, we build a more general visualisation system by extending the HGTM visualisation system in 3 directions: bf (1) We generalize HGTM to noise models from the exponential family of distributions. The basic building block is the Latent Trait Model (LTM) developed in ¸iteKabanpami. bf (2) We give the user a choice of initializing the child plots of the current plot in either em interactive, or em automatic mode. In the interactive mode the user interactively selects ``regions of interest'' as in ¸iteHGTM, whereas in the automatic mode an unsupervised minimum message length (MML)-driven construction of a mixture of LTMs is employed. bf (3) We derive general formulas for magnification factors in latent trait models. Magnification factors are a useful tool to improve our understanding of the visualisation plots, since they can highlight the boundaries between data clusters. The unsupervised construction is particularly useful when high-level plots are covered with dense clusters of highly overlapping data projections, making it difficult to use the interactive mode. Such a situation often arises when visualizing large data sets. We illustrate our approach on a toy example and apply our system to three more complex real data sets.
Resumo:
Today, the data available to tackle many scientific challenges is vast in quantity and diverse in nature. The exploration of heterogeneous information spaces requires suitable mining algorithms as well as effective visual interfaces. miniDVMS v1.8 provides a flexible visual data mining framework which combines advanced projection algorithms developed in the machine learning domain and visual techniques developed in the information visualisation domain. The advantage of this interface is that the user is directly involved in the data mining process. Principled projection methods, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), are integrated with powerful visual techniques, such as magnification factors, directional curvatures, parallel coordinates, and user interaction facilities, to provide this integrated visual data mining framework. The software also supports conventional visualisation techniques such as principal component analysis (PCA), Neuroscale, and PhiVis. This user manual gives an overview of the purpose of the software tool, highlights some of the issues to be taken care while creating a new model, and provides information about how to install and use the tool. The user manual does not require the readers to have familiarity with the algorithms it implements. Basic computing skills are enough to operate the software.
Resumo:
National meteorological offices are largely concerned with synoptic-scale forecasting where weather predictions are produced for a whole country for 24 hours ahead. In practice, many local organisations (such as emergency services, construction industries, forestry, farming, and sports) require only local short-term, bespoke, weather predictions and warnings. This thesis shows that the less-demanding requirements do not require exceptional computing power and can be met by a modern, desk-top system which monitors site-specific ground conditions (such as temperature, pressure, wind speed and direction, etc) augmented with above ground information from satellite images to produce `nowcasts'. The emphasis in this thesis has been towards the design of such a real-time system for nowcasting. Local site-specific conditions are monitored using a custom-built, stand alone, Motorola 6809 based sub-system. Above ground information is received from the METEOSAT 4 geo-stationary satellite using a sub-system based on a commercially available equipment. The information is ephemeral and must be captured in real-time. The real-time nowcasting system for localised weather handles the data as a transparent task using the limited capabilities of the PC system. Ground data produces a time series of measurements at a specific location which represents the past-to-present atmospheric conditions of the particular site from which much information can be extracted. The novel approach adopted in this thesis is one of constructing stochastic models based on the AutoRegressive Integrated Moving Average (ARIMA) technique. The satellite images contain features (such as cloud formations) which evolve dynamically and may be subject to movement, growth, distortion, bifurcation, superposition, or elimination between images. The process of extracting a weather feature, following its motion and predicting its future evolution involves algorithms for normalisation, partitioning, filtering, image enhancement, and correlation of multi-dimensional signals in different domains. To limit the processing requirements, the analysis in this thesis concentrates on an `area of interest'. By this rationale, only a small fraction of the total image needs to be processed, leading to a major saving in time. The thesis also proposes an extention to an existing manual cloud classification technique for its implementation in automatically classifying a cloud feature over the `area of interest' for nowcasting using the multi-dimensional signals.
Resumo:
The G-protein coupled receptors--or GPCRs--comprise simultaneously one of the largest and one of the most multi-functional protein families known to modern-day molecular bioscience. From a drug discovery and pharmaceutical industry perspective, the GPCRs constitute one of the most commercially and economically important groups of proteins known. The GPCRs undertake numerous vital metabolic functions and interact with a hugely diverse range of small and large ligands. Many different methodologies have been developed to efficiently and accurately classify the GPCRs. These range from motif-based techniques to machine learning as well as a variety of alignment-free techniques based on the physiochemical properties of sequences. We review here the available methodologies for the classification of GPCRs. Part of this work focuses on how we have tried to build the intrinsically hierarchical nature of sequence relations, implicit within the family, into an adaptive approach to classification. Importantly, we also allude to some of the key innate problems in developing an effective approach to classifying the GPCRs: the lack of sequence similarity between the six classes that comprise the GPCR family and the low sequence similarity to other family members evinced by many newly revealed members of the family.
Resumo:
The IRDS standard is an international standard produced by the International Organisation for Standardisation (ISO). In this work the process for producing standards in formal standards organisations, for example the ISO, and in more informal bodies, for example the Object Management Group (OMG), is examined. This thesis examines previous models and classifications of standards. The previous models and classifications are then combined to produce a new classification. The IRDS standard is then placed in a class in the new model as a reference anticipatory standard. Anticipatory standards are standards which are developed ahead of the technology in order to attempt to guide the market. The diffusion of the IRDS is traced over a period of eleven years. The economic conditions which affect the diffusion of standards are examined, particularly the economic conditions which prevail in compatibility markets such as the IT and ICT markets. Additionally the consequences of the introduction of gateway or converter devices into a market where a standard has not yet been established is examined. The IRDS standard did not have an installed base and this hindered its diffusion. The thesis concludes that the IRDS standard was overtaken by new developments such as object oriented technologies and middleware. This was partly because of the slow development process of developing standards in traditional organisations which operate on a consensus basis and partly because the IRDS standard did not have an installed base. Also the rise and proliferation of middleware products resulted in exchange mechanisms becoming dominant rather than repository solutions. The research method used in this work is a longitudinal study of the development and diffusion of the ISO/EEC IRDS standard. The research is regarded as a single case study and follows the interpretative epistemological point of view.
Resumo:
The absence of a definitive approach to the design of manufacturing systems signifies the importance of a control mechanism to ensure the timely application of relevant design techniques. To provide effective control, design development needs to be continually assessed in relation to the required system performance, which can only be achieved analytically through computer simulation. The technique providing the only method of accurately replicating the highly complex and dynamic interrelationships inherent within manufacturing facilities and realistically predicting system behaviour. Owing to the unique capabilities of computer simulation, its application should support and encourage a thorough investigation of all alternative designs. Allowing attention to focus specifically on critical design areas and enabling continuous assessment of system evolution. To achieve this system analysis needs to efficient, in terms of data requirements and both speed and accuracy of evaluation. To provide an effective control mechanism a hierarchical or multi-level modelling procedure has therefore been developed, specifying the appropriate degree of evaluation support necessary at each phase of design. An underlying assumption of the proposal being that evaluation is quick, easy and allows models to expand in line with design developments. However, current approaches to computer simulation are totally inappropriate to support the hierarchical evaluation. Implementation of computer simulation through traditional approaches is typically characterized by a requirement for very specialist expertise, a lengthy model development phase, and a correspondingly high expenditure. Resulting in very little and rather inappropriate use of the technique. Simulation, when used, is generally only applied to check or verify a final design proposal. Rarely is the full potential of computer simulation utilized to aid, support or complement the manufacturing system design procedure. To implement the proposed modelling procedure therefore the concept of a generic simulator was adopted, as such systems require no specialist expertise, instead facilitating quick and easy model creation, execution and modification, through simple data inputs. Previously generic simulators have tended to be too restricted, lacking the necessary flexibility to be generally applicable to manufacturing systems. Development of the ATOMS manufacturing simulator, however, has proven that such systems can be relevant to a wide range of applications, besides verifying the benefits of multi-level modelling.
Resumo:
This thesis deals with the problem of Information Systems design for Corporate Management. It shows that the results of applying current approaches to Management Information Systems and Corporate Modelling fully justify a fresh look to the problem. The thesis develops an approach to design based on Cybernetic principles and theories. It looks at Management as an informational process and discusses the relevance of regulation theory to its practice. The work proceeds around the concept of change and its effects on the organization's stability and survival. The idea of looking at organizations as viable systems is discussed and a design to enhance survival capacity is developed. It takes Ashby's theory of adaptation and developments on ultra-stability as a theoretical framework and considering conditions for learning and foresight deduces that a design should include three basic components: A dynamic model of the organization- environment relationships; a method to spot significant changes in the value of the essential variables and in a certain set of parameters; and a Controller able to conceive and change the other two elements and to make choices among alternative policies. Further considerations of the conditions for rapid adaptation in organisms composed of many parts, and the law of Requisite Variety determine that successful adaptive behaviour requires certain functional organization. Beer's model of viable organizations is put in relation to Ashby's theory of adaptation and regulation. The use of the Ultra-stable system as abstract unit of analysis permits developing a rigorous taxonomy of change; it starts distinguishing between change with in behaviour and change of behaviour to complete the classification with organizational change. It relates these changes to the logical categories of learning connecting the topic of Information System design with that of organizational learning.
Resumo:
This thesis applies a hierarchical latent trait model system to a large quantity of data. The motivation for it was lack of viable approaches to analyse High Throughput Screening datasets which maybe include thousands of data points with high dimensions. High Throughput Screening (HTS) is an important tool in the pharmaceutical industry for discovering leads which can be optimised and further developed into candidate drugs. Since the development of new robotic technologies, the ability to test the activities of compounds has considerably increased in recent years. Traditional methods, looking at tables and graphical plots for analysing relationships between measured activities and the structure of compounds, have not been feasible when facing a large HTS dataset. Instead, data visualisation provides a method for analysing such large datasets, especially with high dimensions. So far, a few visualisation techniques for drug design have been developed, but most of them just cope with several properties of compounds at one time. We believe that a latent variable model (LTM) with a non-linear mapping from the latent space to the data space is a preferred choice for visualising a complex high-dimensional data set. As a type of latent variable model, the latent trait model can deal with either continuous data or discrete data, which makes it particularly useful in this domain. In addition, with the aid of differential geometry, we can imagine the distribution of data from magnification factor and curvature plots. Rather than obtaining the useful information just from a single plot, a hierarchical LTM arranges a set of LTMs and their corresponding plots in a tree structure. We model the whole data set with a LTM at the top level, which is broken down into clusters at deeper levels of t.he hierarchy. In this manner, the refined visualisation plots can be displayed in deeper levels and sub-clusters may be found. Hierarchy of LTMs is trained using expectation-maximisation (EM) algorithm to maximise its likelihood with respect to the data sample. Training proceeds interactively in a recursive fashion (top-down). The user subjectively identifies interesting regions on the visualisation plot that they would like to model in a greater detail. At each stage of hierarchical LTM construction, the EM algorithm alternates between the E- and M-step. Another problem that can occur when visualising a large data set is that there may be significant overlaps of data clusters. It is very difficult for the user to judge where centres of regions of interest should be put. We address this problem by employing the minimum message length technique, which can help the user to decide the optimal structure of the model. In this thesis we also demonstrate the applicability of the hierarchy of latent trait models in the field of document data mining.
Resumo:
This thesis presents an investigation into the application of methods of uncertain reasoning to the biological classification of river water quality. Existing biological methods for reporting river water quality are critically evaluated, and the adoption of a discrete biological classification scheme advocated. Reasoning methods for managing uncertainty are explained, in which the Bayesian and Dempster-Shafer calculi are cited as primary numerical schemes. Elicitation of qualitative knowledge on benthic invertebrates is described. The specificity of benthic response to changes in water quality leads to the adoption of a sensor model of data interpretation, in which a reference set of taxa provide probabilistic support for the biological classes. The significance of sensor states, including that of absence, is shown. Novel techniques of directly eliciting the required uncertainty measures are presented. Bayesian and Dempster-Shafer calculi were used to combine the evidence provided by the sensors. The performance of these automatic classifiers was compared with the expert's own discrete classification of sampled sites. Variations of sensor data weighting, combination order and belief representation were examined for their effect on classification performance. The behaviour of the calculi under evidential conflict and alternative combination rules was investigated. Small variations in evidential weight and the inclusion of evidence from sensors absent from a sample improved classification performance of Bayesian belief and support for singleton hypotheses. For simple support, inclusion of absent evidence decreased classification rate. The performance of Dempster-Shafer classification using consonant belief functions was comparable to Bayesian and singleton belief. Recommendations are made for further work in biological classification using uncertain reasoning methods, including the combination of multiple-expert opinion, the use of Bayesian networks, and the integration of classification software within a decision support system for water quality assessment.
Resumo:
Classification of metamorphic rocks is normally carried out using a poorly defined, subjective classification scheme making this an area in which many undergraduate geologists experience difficulties. An expert system to assist in such classification is presented which is capable of classifying rocks and also giving further details about a particular rock type. A mixed knowledge representation is used with frame, semantic and production rule systems available. Classification in the domain requires that different facets of a rock be classified. To implement this, rocks are represented by 'context' frames with slots representing each facet. Slots are satisfied by calling a pre-defined ruleset to carry out the necessary inference. The inference is handled by an interpreter which uses a dependency graph representation for the propagation of evidence. Uncertainty is handled by the system using a combination of the MYCIN certainty factor system and the Dempster-Shafer range mechanism. This allows for positive and negative reasoning, with rules capable of representing necessity and sufficiency of evidence, whilst also allowing the implementation of an alpha-beta pruning algorithm to guide question selection during inference. The system also utilizes a semantic net type structure to allow the expert to encode simple relationships between terms enabling rules to be written with a sensible level of abstraction. Using frames to represent rock types where subclassification is possible allows the knowledge base to be built in a modular fashion with subclassification frames only defined once the higher level of classification is functioning. Rulesets can similarly be added in modular fashion with the individual rules being essentially declarative allowing for simple updating and maintenance. The knowledge base so far developed for metamorphic classification serves to demonstrate the performance of the interpreter design whilst also moving some way towards providing a useful assistant to the non-expert metamorphic petrologist. The system demonstrates the possibilities for a fully developed knowledge base to handle the classification of igneous, sedimentary and metamorphic rocks. The current knowledge base and interpreter have been evaluated by potential users and experts. The results of the evaluation show that the system performs to an acceptable level and should be of use as a tool for both undergraduates and researchers from outside the metamorphic petrography field. .
Resumo:
The process framework comprises three phases, as follows: scope the supply chain/network; identify the options for supply system architecture and select supply system architecture. It facilitates a structured approach that analyses the supply chain/network contextual characteristics, in order to ensure alignment with the appropriate supply system architecture. The process framework was derived from comprehensive literature review and archival case study analysis. The review led to the classification of supply system architectures according to their orientation, whether integrated; partially integrated; co-ordinated or independent. The classification was combined with the characteristics that influence the selection of supply system architecture to encapsulate the conceptual framework. It builds upon existing frameworks and methodologies by focusing on structured procedure; supporting project management; facilitating participation and clarifying point of entry. The process framework was initially tested in three case study applications from the food, automobile and hand tool industries. A variety of industrial settings was chosen to illustrate transferability. The case study applications indicate that the process framework is a valid approach to the problem; however, further testing is required. In particular, the use of group support system technologies to support the process and the steps involving the participation of software vendors need further testing. However, the process framework can be followed due to the clarity of its presentation. It considers the issue of timing by including alternative decision-making techniques, dependent on the constraints. It is useful for ensuring a sound business case is developed, with supporting documentation and analysis that identifies the strategic and functional requirements of supply system architecture.
Resumo:
Hierarchical knowledge structures are frequently used within clinical decision support systems as part of the model for generating intelligent advice. The nodes in the hierarchy inevitably have varying influence on the decisionmaking processes, which needs to be reflected by parameters. If the model has been elicited from human experts, it is not feasible to ask them to estimate the parameters because there will be so many in even moderately-sized structures. This paper describes how the parameters could be obtained from data instead, using only a small number of cases. The original method [1] is applied to a particular web-based clinical decision support system called GRiST, which uses its hierarchical knowledge to quantify the risks associated with mental-health problems. The knowledge was elicited from multidisciplinary mental-health practitioners but the tree has several thousand nodes, all requiring an estimation of their relative influence on the assessment process. The method described in the paper shows how they can be obtained from about 200 cases instead. It greatly reduces the experts’ elicitation tasks and has the potential for being generalised to similar knowledge-engineering domains where relative weightings of node siblings are part of the parameter space.
Resumo:
This paper describes an innovative sensing approach allowing capture, discrimination, and classification of transients automatically in gait. A walking platform is described, which offers an alternative design to that of standard force plates with advantages that include mechanical simplicity and less restriction on dimensions. The scope of the work is to investigate as an experiment the sensitivity of the distributive tactile sensing method with the potential to address flexibility on gait assessment, including patient targeting and the extension to a variety of ambulatory applications. Using infrared sensors to measure plate deflection, gait patterns are compared with stored templates using a pattern recognition algorithm. This information is input into a neural network to classify normal and affected walking events, with a classification accuracy of just under 90 per cent achieved. The system developed has potential applications in gait analysis and rehabilitation, whereby it can be used as a tool for early diagnosis of walking disorders or to determine changes between pre- and post-operative gait.
Resumo:
The computer simulation of manufacturing systems is commonly carried out using discrete event simulation (DES). Indeed, there appears to be a lack of applications of continuous simulation methods, particularly system dynamics (SD), despite evidence that this technique is suitable for industrial modelling. This paper investigates whether this is due to a decline in the general popularity of SD, or whether modelling of manufacturing systems represents a missed opportunity for SD. On this basis, the paper first gives a review of the concept of SD and fully describes the modelling technique. Following on, a survey of the published applications of SD in the 1990s is made by developing and using a structured classification approach. From this review, observations are made about the application of the SD method and opportunities for future research are suggested.