39 resultados para hierarchical classification system
Resumo:
The number of new chemical entities (NCE) is increasing every day after the introduction of combinatorial chemistry and high throughput screening to the drug discovery cycle. One third of these new compounds have aqueous solubility less than 20µg/mL [1]. Therefore, a great deal of interest has been forwarded to the salt formation technique to overcome solubility limitations. This study aims to improve the drug solubility of a Biopharmaceutical Classification System class II (BCS II) model drug (Indomethacin; IND) using basic amino acids (L-arginine, L-lysine and L-histidine) as counterions. Three new salts were prepared using freeze drying method and characterised by FT-IR spectroscopy, proton nuclear magnetic resonance ((1)HNMR), Differential Scanning Calorimetry (DSC) and Thermogravimetric analysis (TGA). The effect of pH on IND solubility was also investigated using pH-solubility profile. Both arginine and lysine formed novel salts with IND, while histidine failed to dissociate the free acid and in turn no salt was formed. Arginine and lysine increased IND solubility by 10,000 and 2296 fold, respectively. An increase in dissolution rate was also observed for the novel salts. Since these new salts have improved IND solubility to that similar to BCS class I drugs, IND salts could be considered for possible waivers of bioequivalence.
Resumo:
Aim: Identify the incidence of vitreomacular traction (VMT) and frequency of reduced vision in the absence of other coexisting macular pathology using a pragmatic classification system for VMT in a population of patients referred to the hospital eye service. Methods: A detailed survey of consecutive optical coherence tomography (OCT) scans was done in a high-throughput ocular imaging service to ascertain cases of vitreomacular adhesion (VMA) and VMT using a departmental classification system. Analysis was done on the stages of traction, visual acuity, and association with other macular conditions. Results: In total, 4384 OCT scan episodes of 2223 patients were performed. Two hundred and fourteen eyes had VMA/VMT, with 112 eyes having coexisting macular pathology. Of 102 patients without coexisting pathology, 57 patients had VMT grade between 2 and 8, with a negative correlation between VMT grade and number of Snellen lines (r= -0.61717). There was a distinct cutoff in visual function when VMT grade was higher than 4 with the presence of cysts and sub retinal separation and breaks in the retinal layers. Conclusions: VMT is a common encounter often associated with other coexisting macular pathology. We estimated an incidence rate of 0.01% of VMT cases with reduced vision and without coexisting macular pathology that may potentially benefit from intervention. Grading of VMT to select eyes with cyst formation as well as hole formation may be useful for targeting patients who are at higher risk of visual loss from VMT.
Resumo:
Computer-Based Learning systems of one sort or another have been in existence for almost 20 years, but they have yet to achieve real credibility within Commerce, Industry or Education. A variety of reasons could be postulated for this, typically: - cost - complexity - inefficiency - inflexibility - tedium Obviously different systems deserve different levels and types of criticism, but it still remains true that Computer-Based Learning (CBL) is falling significantly short of its potential. Experience of a small, but highly successful CBL system within a large, geographically distributed industry (the National Coal Board) prompted an investigation into currently available packages, the original intention being to purchase the most suitable software and run it on existing computer hardware, alongside existing software systems. It became apparent that none of the available CBL packages were suitable, and a decision was taken to develop an in-house Computer-Assisted Instruction system according to the following criteria: - cheap to run; - easy to author course material; - easy to use; - requires no computing knowledge to use (as either an author or student) ; - efficient in the use of computer resources; - has a comprehensive range of facilities at all levels. This thesis describes the initial investigation, resultant observations and the design, development and implementation of the SCHOOL system. One of the principal characteristics c£ SCHOOL is that it uses a hierarchical database structure for the storage of course material - thereby providing inherently a great deal of the power, flexibility and efficiency originally required. Trials using the SCHOOL system on IBM 303X series equipment are also detailed, along with proposed and current development work on what is essentially an operational CBL system within a large-scale Industrial environment.
Resumo:
Biological experiments often produce enormous amount of data, which are usually analyzed by data clustering. Cluster analysis refers to statistical methods that are used to assign data with similar properties into several smaller, more meaningful groups. Two commonly used clustering techniques are introduced in the following section: principal component analysis (PCA) and hierarchical clustering. PCA calculates the variance between variables and groups them into a few uncorrelated groups or principal components (PCs) that are orthogonal to each other. Hierarchical clustering is carried out by separating data into many clusters and merging similar clusters together. Here, we use an example of human leukocyte antigen (HLA) supertype classification to demonstrate the usage of the two methods. Two programs, Generating Optimal Linear Partial Least Square Estimations (GOLPE) and Sybyl, are used for PCA and hierarchical clustering, respectively. However, the reader should bear in mind that the methods have been incorporated into other software as well, such as SIMCA, statistiXL, and R.
Resumo:
The data available during the drug discovery process is vast in amount and diverse in nature. To gain useful information from such data, an effective visualisation tool is required. To provide better visualisation facilities to the domain experts (screening scientist, biologist, chemist, etc.),we developed a software which is based on recently developed principled visualisation algorithms such as Generative Topographic Mapping (GTM) and Hierarchical Generative Topographic Mapping (HGTM). The software also supports conventional visualisation techniques such as Principal Component Analysis, NeuroScale, PhiVis, and Locally Linear Embedding (LLE). The software also provides global and local regression facilities . It supports regression algorithms such as Multilayer Perceptron (MLP), Radial Basis Functions network (RBF), Generalised Linear Models (GLM), Mixture of Experts (MoE), and newly developed Guided Mixture of Experts (GME). This user manual gives an overview of the purpose of the software tool, highlights some of the issues to be taken care while creating a new model, and provides information about how to install & use the tool. The user manual does not require the readers to have familiarity with the algorithms it implements. Basic computing skills are enough to operate the software.
Resumo:
An interactive hierarchical Generative Topographic Mapping (HGTM) ¸iteHGTM has been developed to visualise complex data sets. In this paper, we build a more general visualisation system by extending the HGTM visualisation system in 3 directions: bf (1) We generalize HGTM to noise models from the exponential family of distributions. The basic building block is the Latent Trait Model (LTM) developed in ¸iteKabanpami. bf (2) We give the user a choice of initializing the child plots of the current plot in either em interactive, or em automatic mode. In the interactive mode the user interactively selects ``regions of interest'' as in ¸iteHGTM, whereas in the automatic mode an unsupervised minimum message length (MML)-driven construction of a mixture of LTMs is employed. bf (3) We derive general formulas for magnification factors in latent trait models. Magnification factors are a useful tool to improve our understanding of the visualisation plots, since they can highlight the boundaries between data clusters. The unsupervised construction is particularly useful when high-level plots are covered with dense clusters of highly overlapping data projections, making it difficult to use the interactive mode. Such a situation often arises when visualizing large data sets. We illustrate our approach on a toy example and apply our system to three more complex real data sets.
Resumo:
Today, the data available to tackle many scientific challenges is vast in quantity and diverse in nature. The exploration of heterogeneous information spaces requires suitable mining algorithms as well as effective visual interfaces. miniDVMS v1.8 provides a flexible visual data mining framework which combines advanced projection algorithms developed in the machine learning domain and visual techniques developed in the information visualisation domain. The advantage of this interface is that the user is directly involved in the data mining process. Principled projection methods, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), are integrated with powerful visual techniques, such as magnification factors, directional curvatures, parallel coordinates, and user interaction facilities, to provide this integrated visual data mining framework. The software also supports conventional visualisation techniques such as principal component analysis (PCA), Neuroscale, and PhiVis. This user manual gives an overview of the purpose of the software tool, highlights some of the issues to be taken care while creating a new model, and provides information about how to install and use the tool. The user manual does not require the readers to have familiarity with the algorithms it implements. Basic computing skills are enough to operate the software.
Resumo:
National meteorological offices are largely concerned with synoptic-scale forecasting where weather predictions are produced for a whole country for 24 hours ahead. In practice, many local organisations (such as emergency services, construction industries, forestry, farming, and sports) require only local short-term, bespoke, weather predictions and warnings. This thesis shows that the less-demanding requirements do not require exceptional computing power and can be met by a modern, desk-top system which monitors site-specific ground conditions (such as temperature, pressure, wind speed and direction, etc) augmented with above ground information from satellite images to produce `nowcasts'. The emphasis in this thesis has been towards the design of such a real-time system for nowcasting. Local site-specific conditions are monitored using a custom-built, stand alone, Motorola 6809 based sub-system. Above ground information is received from the METEOSAT 4 geo-stationary satellite using a sub-system based on a commercially available equipment. The information is ephemeral and must be captured in real-time. The real-time nowcasting system for localised weather handles the data as a transparent task using the limited capabilities of the PC system. Ground data produces a time series of measurements at a specific location which represents the past-to-present atmospheric conditions of the particular site from which much information can be extracted. The novel approach adopted in this thesis is one of constructing stochastic models based on the AutoRegressive Integrated Moving Average (ARIMA) technique. The satellite images contain features (such as cloud formations) which evolve dynamically and may be subject to movement, growth, distortion, bifurcation, superposition, or elimination between images. The process of extracting a weather feature, following its motion and predicting its future evolution involves algorithms for normalisation, partitioning, filtering, image enhancement, and correlation of multi-dimensional signals in different domains. To limit the processing requirements, the analysis in this thesis concentrates on an `area of interest'. By this rationale, only a small fraction of the total image needs to be processed, leading to a major saving in time. The thesis also proposes an extention to an existing manual cloud classification technique for its implementation in automatically classifying a cloud feature over the `area of interest' for nowcasting using the multi-dimensional signals.
Resumo:
The G-protein coupled receptors--or GPCRs--comprise simultaneously one of the largest and one of the most multi-functional protein families known to modern-day molecular bioscience. From a drug discovery and pharmaceutical industry perspective, the GPCRs constitute one of the most commercially and economically important groups of proteins known. The GPCRs undertake numerous vital metabolic functions and interact with a hugely diverse range of small and large ligands. Many different methodologies have been developed to efficiently and accurately classify the GPCRs. These range from motif-based techniques to machine learning as well as a variety of alignment-free techniques based on the physiochemical properties of sequences. We review here the available methodologies for the classification of GPCRs. Part of this work focuses on how we have tried to build the intrinsically hierarchical nature of sequence relations, implicit within the family, into an adaptive approach to classification. Importantly, we also allude to some of the key innate problems in developing an effective approach to classifying the GPCRs: the lack of sequence similarity between the six classes that comprise the GPCR family and the low sequence similarity to other family members evinced by many newly revealed members of the family.
Resumo:
The IRDS standard is an international standard produced by the International Organisation for Standardisation (ISO). In this work the process for producing standards in formal standards organisations, for example the ISO, and in more informal bodies, for example the Object Management Group (OMG), is examined. This thesis examines previous models and classifications of standards. The previous models and classifications are then combined to produce a new classification. The IRDS standard is then placed in a class in the new model as a reference anticipatory standard. Anticipatory standards are standards which are developed ahead of the technology in order to attempt to guide the market. The diffusion of the IRDS is traced over a period of eleven years. The economic conditions which affect the diffusion of standards are examined, particularly the economic conditions which prevail in compatibility markets such as the IT and ICT markets. Additionally the consequences of the introduction of gateway or converter devices into a market where a standard has not yet been established is examined. The IRDS standard did not have an installed base and this hindered its diffusion. The thesis concludes that the IRDS standard was overtaken by new developments such as object oriented technologies and middleware. This was partly because of the slow development process of developing standards in traditional organisations which operate on a consensus basis and partly because the IRDS standard did not have an installed base. Also the rise and proliferation of middleware products resulted in exchange mechanisms becoming dominant rather than repository solutions. The research method used in this work is a longitudinal study of the development and diffusion of the ISO/EEC IRDS standard. The research is regarded as a single case study and follows the interpretative epistemological point of view.
Resumo:
The absence of a definitive approach to the design of manufacturing systems signifies the importance of a control mechanism to ensure the timely application of relevant design techniques. To provide effective control, design development needs to be continually assessed in relation to the required system performance, which can only be achieved analytically through computer simulation. The technique providing the only method of accurately replicating the highly complex and dynamic interrelationships inherent within manufacturing facilities and realistically predicting system behaviour. Owing to the unique capabilities of computer simulation, its application should support and encourage a thorough investigation of all alternative designs. Allowing attention to focus specifically on critical design areas and enabling continuous assessment of system evolution. To achieve this system analysis needs to efficient, in terms of data requirements and both speed and accuracy of evaluation. To provide an effective control mechanism a hierarchical or multi-level modelling procedure has therefore been developed, specifying the appropriate degree of evaluation support necessary at each phase of design. An underlying assumption of the proposal being that evaluation is quick, easy and allows models to expand in line with design developments. However, current approaches to computer simulation are totally inappropriate to support the hierarchical evaluation. Implementation of computer simulation through traditional approaches is typically characterized by a requirement for very specialist expertise, a lengthy model development phase, and a correspondingly high expenditure. Resulting in very little and rather inappropriate use of the technique. Simulation, when used, is generally only applied to check or verify a final design proposal. Rarely is the full potential of computer simulation utilized to aid, support or complement the manufacturing system design procedure. To implement the proposed modelling procedure therefore the concept of a generic simulator was adopted, as such systems require no specialist expertise, instead facilitating quick and easy model creation, execution and modification, through simple data inputs. Previously generic simulators have tended to be too restricted, lacking the necessary flexibility to be generally applicable to manufacturing systems. Development of the ATOMS manufacturing simulator, however, has proven that such systems can be relevant to a wide range of applications, besides verifying the benefits of multi-level modelling.
Resumo:
This thesis deals with the problem of Information Systems design for Corporate Management. It shows that the results of applying current approaches to Management Information Systems and Corporate Modelling fully justify a fresh look to the problem. The thesis develops an approach to design based on Cybernetic principles and theories. It looks at Management as an informational process and discusses the relevance of regulation theory to its practice. The work proceeds around the concept of change and its effects on the organization's stability and survival. The idea of looking at organizations as viable systems is discussed and a design to enhance survival capacity is developed. It takes Ashby's theory of adaptation and developments on ultra-stability as a theoretical framework and considering conditions for learning and foresight deduces that a design should include three basic components: A dynamic model of the organization- environment relationships; a method to spot significant changes in the value of the essential variables and in a certain set of parameters; and a Controller able to conceive and change the other two elements and to make choices among alternative policies. Further considerations of the conditions for rapid adaptation in organisms composed of many parts, and the law of Requisite Variety determine that successful adaptive behaviour requires certain functional organization. Beer's model of viable organizations is put in relation to Ashby's theory of adaptation and regulation. The use of the Ultra-stable system as abstract unit of analysis permits developing a rigorous taxonomy of change; it starts distinguishing between change with in behaviour and change of behaviour to complete the classification with organizational change. It relates these changes to the logical categories of learning connecting the topic of Information System design with that of organizational learning.
Resumo:
This thesis applies a hierarchical latent trait model system to a large quantity of data. The motivation for it was lack of viable approaches to analyse High Throughput Screening datasets which maybe include thousands of data points with high dimensions. High Throughput Screening (HTS) is an important tool in the pharmaceutical industry for discovering leads which can be optimised and further developed into candidate drugs. Since the development of new robotic technologies, the ability to test the activities of compounds has considerably increased in recent years. Traditional methods, looking at tables and graphical plots for analysing relationships between measured activities and the structure of compounds, have not been feasible when facing a large HTS dataset. Instead, data visualisation provides a method for analysing such large datasets, especially with high dimensions. So far, a few visualisation techniques for drug design have been developed, but most of them just cope with several properties of compounds at one time. We believe that a latent variable model (LTM) with a non-linear mapping from the latent space to the data space is a preferred choice for visualising a complex high-dimensional data set. As a type of latent variable model, the latent trait model can deal with either continuous data or discrete data, which makes it particularly useful in this domain. In addition, with the aid of differential geometry, we can imagine the distribution of data from magnification factor and curvature plots. Rather than obtaining the useful information just from a single plot, a hierarchical LTM arranges a set of LTMs and their corresponding plots in a tree structure. We model the whole data set with a LTM at the top level, which is broken down into clusters at deeper levels of t.he hierarchy. In this manner, the refined visualisation plots can be displayed in deeper levels and sub-clusters may be found. Hierarchy of LTMs is trained using expectation-maximisation (EM) algorithm to maximise its likelihood with respect to the data sample. Training proceeds interactively in a recursive fashion (top-down). The user subjectively identifies interesting regions on the visualisation plot that they would like to model in a greater detail. At each stage of hierarchical LTM construction, the EM algorithm alternates between the E- and M-step. Another problem that can occur when visualising a large data set is that there may be significant overlaps of data clusters. It is very difficult for the user to judge where centres of regions of interest should be put. We address this problem by employing the minimum message length technique, which can help the user to decide the optimal structure of the model. In this thesis we also demonstrate the applicability of the hierarchy of latent trait models in the field of document data mining.
Resumo:
This thesis presents an investigation into the application of methods of uncertain reasoning to the biological classification of river water quality. Existing biological methods for reporting river water quality are critically evaluated, and the adoption of a discrete biological classification scheme advocated. Reasoning methods for managing uncertainty are explained, in which the Bayesian and Dempster-Shafer calculi are cited as primary numerical schemes. Elicitation of qualitative knowledge on benthic invertebrates is described. The specificity of benthic response to changes in water quality leads to the adoption of a sensor model of data interpretation, in which a reference set of taxa provide probabilistic support for the biological classes. The significance of sensor states, including that of absence, is shown. Novel techniques of directly eliciting the required uncertainty measures are presented. Bayesian and Dempster-Shafer calculi were used to combine the evidence provided by the sensors. The performance of these automatic classifiers was compared with the expert's own discrete classification of sampled sites. Variations of sensor data weighting, combination order and belief representation were examined for their effect on classification performance. The behaviour of the calculi under evidential conflict and alternative combination rules was investigated. Small variations in evidential weight and the inclusion of evidence from sensors absent from a sample improved classification performance of Bayesian belief and support for singleton hypotheses. For simple support, inclusion of absent evidence decreased classification rate. The performance of Dempster-Shafer classification using consonant belief functions was comparable to Bayesian and singleton belief. Recommendations are made for further work in biological classification using uncertain reasoning methods, including the combination of multiple-expert opinion, the use of Bayesian networks, and the integration of classification software within a decision support system for water quality assessment.
Resumo:
Classification of metamorphic rocks is normally carried out using a poorly defined, subjective classification scheme making this an area in which many undergraduate geologists experience difficulties. An expert system to assist in such classification is presented which is capable of classifying rocks and also giving further details about a particular rock type. A mixed knowledge representation is used with frame, semantic and production rule systems available. Classification in the domain requires that different facets of a rock be classified. To implement this, rocks are represented by 'context' frames with slots representing each facet. Slots are satisfied by calling a pre-defined ruleset to carry out the necessary inference. The inference is handled by an interpreter which uses a dependency graph representation for the propagation of evidence. Uncertainty is handled by the system using a combination of the MYCIN certainty factor system and the Dempster-Shafer range mechanism. This allows for positive and negative reasoning, with rules capable of representing necessity and sufficiency of evidence, whilst also allowing the implementation of an alpha-beta pruning algorithm to guide question selection during inference. The system also utilizes a semantic net type structure to allow the expert to encode simple relationships between terms enabling rules to be written with a sensible level of abstraction. Using frames to represent rock types where subclassification is possible allows the knowledge base to be built in a modular fashion with subclassification frames only defined once the higher level of classification is functioning. Rulesets can similarly be added in modular fashion with the individual rules being essentially declarative allowing for simple updating and maintenance. The knowledge base so far developed for metamorphic classification serves to demonstrate the performance of the interpreter design whilst also moving some way towards providing a useful assistant to the non-expert metamorphic petrologist. The system demonstrates the possibilities for a fully developed knowledge base to handle the classification of igneous, sedimentary and metamorphic rocks. The current knowledge base and interpreter have been evaluated by potential users and experts. The results of the evaluation show that the system performs to an acceptable level and should be of use as a tool for both undergraduates and researchers from outside the metamorphic petrography field. .