18 resultados para Information Discovery Paradigm,
em Aston University Research Archive
Resumo:
Multidimensional compound optimization is a new paradigm in the drug discovery process, yielding efficiencies during early stages and reducing attrition in the later stages of drug development. The success of this strategy relies heavily on understanding this multidimensional data and extracting useful information from it. This paper demonstrates how principled visualization algorithms can be used to understand and explore a large data set created in the early stages of drug discovery. The experiments presented are performed on a real-world data set comprising biological activity data and some whole-molecular physicochemical properties. Data visualization is a popular way of presenting complex data in a simpler form. We have applied powerful principled visualization methods, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), to help the domain experts (screening scientists, chemists, biologists, etc.) understand and draw meaningful decisions. We also benchmark these principled methods against relatively better known visualization approaches, principal component analysis (PCA), Sammon's mapping, and self-organizing maps (SOMs), to demonstrate their enhanced power to help the user visualize the large multidimensional data sets one has to deal with during the early stages of the drug discovery process. The results reported clearly show that the GTM and HGTM algorithms allow the user to cluster active compounds for different targets and understand them better than the benchmarks. An interactive software tool supporting these visualization algorithms was provided to the domain experts. The tool facilitates the domain experts by exploration of the projection obtained from the visualization algorithms providing facilities such as parallel coordinate plots, magnification factors, directional curvatures, and integration with industry standard software. © 2006 American Chemical Society.
Resumo:
In this paper, we propose a text mining method called LRD (latent relation discovery), which extends the traditional vector space model of document representation in order to improve information retrieval (IR) on documents and document clustering. Our LRD method extracts terms and entities, such as person, organization, or project names, and discovers relationships between them by taking into account their co-occurrence in textual corpora. Given a target entity, LRD discovers other entities closely related to the target effectively and efficiently. With respect to such relatedness, a measure of relation strength between entities is defined. LRD uses relation strength to enhance the vector space model, and uses the enhanced vector space model for query based IR on documents and clustering documents in order to discover complex relationships among terms and entities. Our experiments on a standard dataset for query based IR shows that our LRD method performed significantly better than traditional vector space model and other five standard statistical methods for vector expansion.
Resumo:
Analysing investments in ISs in order to maximise benefits has become a prime concern, especially for private corporations. No formula of equilibrium exists that could link the injected amounts and accrued returns. The relationship is simply not straightforward. This thesis is based upon empirical work which involved sketching organisational ethnographies (four organographies and a sectography) into the role and value of information systems in Jordanian financial organisations. Besides deciphering the map of impacts, it explains the attributions of the variations in the impacts of ISs which were found to be related to the internal organisational processes: culturally and politically specific considerations, economically or technically rooted factors and environmental factors. The research serves as an empirical attempt to test out the applicability of adopting the interpretive paradigm to researching organisations in a developing country. The fieldwork comprised an exploratory stage, a detailed investigation of four case studies and a survey stage encompassing 16 organisations. Primary and secondary data were collected from multiple sources using a range of instruments. The evidence highlights the fact that little long term strategic planning was pursued; the emphasis was more focused on short term planning. There was no noticeable adoption of any strategic fit principle linking IS strategy to the corporate strategy. In addition, the benefits obtained were mostly intangible. Although ISs were central to the work of the organisations surveyed as the core technology, they were considered as tools or work enablers rather than weapons for competitive rivalry. The cultural specificity of IS impacts was evident and the cultural and political considerations were key factors in explaining the attributions of the variations in the impacts of ISs in JFOs. The thesis confirms that measuring the benefits of ISs is the problematic. However, in order to gain more insight, the phenomenon of "the use of ISs" has to be studied within its context.
Resumo:
In this paper we consider the optimisation of Shannon mutual information (MI) in the context of two model neural systems The first is a stochastic pooling network (population) of McCulloch-Pitts (MP) type neurons (logical threshold units) subject to stochastic forcing; the second is (in a rate coding paradigm) a population of neurons that each displays Poisson statistics (the so called 'Poisson neuron'). The mutual information is optimised as a function of a parameter that characterises the 'noise level'-in the MP array this parameter is the standard deviation of the noise, in the population of Poisson neurons it is the window length used to determine the spike count. In both systems we find that the emergent neural architecture and; hence, code that maximises the MI is strongly influenced by the noise level. Low noise levels leads to a heterogeneous distribution of neural parameters (diversity), whereas, medium to high noise levels result in the clustering of neural parameters into distinct groups that can be interpreted as subpopulations In both cases the number of subpopulations increases with a decrease in noise level. Our results suggest that subpopulations are a generic feature of an information optimal neural population.
Resumo:
The CancerGrid consortium is developing open-standards cancer informatics to address the challenges posed by modern cancer clinical trials. This paper presents the service-oriented software paradigm implemented in CancerGrid to derive clinical trial information management systems for collaborative cancer research across multiple institutions. Our proposal is founded on a combination of a clinical trial (meta)model and WSRF (Web Services Resource Framework), and is currently being evaluated for use in early phase trials. Although primarily targeted at cancer research, our approach is readily applicable to other areas for which a similar information model is available.
Resumo:
This thesis introduces a flexible visual data exploration framework which combines advanced projection algorithms from the machine learning domain with visual representation techniques developed in the information visualisation domain to help a user to explore and understand effectively large multi-dimensional datasets. The advantage of such a framework to other techniques currently available to the domain experts is that the user is directly involved in the data mining process and advanced machine learning algorithms are employed for better projection. A hierarchical visualisation model guided by a domain expert allows them to obtain an informed segmentation of the input space. Two other components of this thesis exploit properties of these principled probabilistic projection algorithms to develop a guided mixture of local experts algorithm which provides robust prediction and a model to estimate feature saliency simultaneously with the training of a projection algorithm.Local models are useful since a single global model cannot capture the full variability of a heterogeneous data space such as the chemical space. Probabilistic hierarchical visualisation techniques provide an effective soft segmentation of an input space by a visualisation hierarchy whose leaf nodes represent different regions of the input space. We use this soft segmentation to develop a guided mixture of local experts (GME) algorithm which is appropriate for the heterogeneous datasets found in chemoinformatics problems. Moreover, in this approach the domain experts are more involved in the model development process which is suitable for an intuition and domain knowledge driven task such as drug discovery. We also derive a generative topographic mapping (GTM) based data visualisation approach which estimates feature saliency simultaneously with the training of a visualisation model.
Resumo:
This research examines the role of the information management process within a process-oriented enterprise, Xerox Ltd. The research approach is based on a post-positive paradigm and has resulted in thirty-five idiographic statements. The three major outcomes are: 1. The process-oriented holistic enterprise is an organisation that requires a long-term management commitment to its development. It depends on the careful management of people, tasks, information and technology. A complex integration of business processes is required and this can be managed through the use of consistent documentation techniques, clarity in the definition of process responsibilities and management attention to the global metrics and the centralisation of the management of the process model are critical to its success. 2. The role of the information management process within the context of a process-oriented enterprise is to provide flexible and cost-effective applications, technological, and process support to the business. This is best achieved through a centralisation of the management of information management and of the process model. A business-led approach combined with the consolidation of applications, information, process, and data architectures is central to providing effective business and process-focused support. 3. In a process oriented holistic enterprise, process and information management are inextricably linked. The model of process management depends heavily on information management, whilst the model of information management is totally focused around supporting and creating the process model. The two models are mutually creating - one cannot exist without the other. There is a duality concept of process and information management.
Resumo:
This thesis examines children's consumer choice behaviour using an information processing perspective, with the fundamental goal of applying academic research to practical marketing and commercial problems. Proceeding a preface, which describes the academic and commercial terms of reference within which this interdisciplinary study is couched, the thesis comprises four discernible parts. Initially, the rationale inherent in adopting an information processing perspective is justified and the diverse array of topics which have bearing on children's consumer processing and behaviour are aggregated. The second part uses this perspective as a springboard to appraise the little explored role of memory, and especially memory structure, as a central cognitive component in children's consumer choice processing. The main research theme explores the ease with which 10 and 11 year olds retrieve contemporary consumer information from subjectively defined memory organisations. Adopting a sort-recall paradigm, hierarchical retrieval processing is stimulated and it is contended that when two items, known to be stored proximally in the memory organisation are not recalled adjacently, this discrepancy is indicative of retrieval processing ease. Results illustrate the marked influence of task conditions and orientation of memory structure on retrieval; these conclusions are accounted for in terms of input and integration failure. The third section develops the foregoing interpellations in the marketing context. A straightforward methodology for structuring marketing situations is postulated, a basis for segmenting children's markets using processing characteristics is adopted, and criteria for communicating brand support information to children are discussed. A taxonomy of market-induced processing conditions is developed. Finally, a case study with topical commercial significance is described. The development, launch and marketing of a new product in the confectionery market is outlined, the aetiology of its subsequent demise identified and expounded, and prescriptive guidelines are put forward to help avert future repetition of marketing misjudgements.
Resumo:
The slow down in the drug discovery pipeline is, in part, owing to a lack of structural and functional information available for new drug targets. Membrane proteins, the targets of well over 50% of marketed pharmaceuticals, present a particular challenge. As they are not naturally abundant, they must be produced recombinantly for the structural biology that is a prerequisite to structure-based drug design. Unfortunately, however, obtaining high yields of functional, recombinant membrane proteins remains a major bottleneck in contemporary bioscience. While repeated rounds of trial-and-error optimization have not (and cannot) reveal mechanistic details of the biology of recombinant protein production, examination of the host response has provided new insights. To this end, we published an early transcriptome analysis that identified genes implicated in high-yielding yeast cell factories, which has enabled the engineering of improved production strains. These advances offer hope that the bottleneck of membrane protein production can be relieved rationally.
Resumo:
Procedural knowledge is the knowledge required to perform certain tasks. It forms an important part of expertise, and is crucial for learning new tasks. This paper summarises existing work on procedural knowledge acquisition, and identifies two major challenges that remain to be solved in this field; namely, automating the acquisition process to tackle bottleneck in the formalization of procedural knowledge, and enabling machine understanding and manipulation of procedural knowledge. It is believed that recent advances in information extraction techniques can be applied compose a comprehensive solution to address these challenges. We identify specific tasks required to achieve the goal, and present detailed analyses of new research challenges and opportunities. It is expected that these analyses will interest researchers of various knowledge management tasks, particularly knowledge acquisition and capture.
Resumo:
In current organizations, valuable enterprise knowledge is often buried under rapidly expanding huge amount of unstructured information in the form of web pages, blogs, and other forms of human text communications. We present a novel unsupervised machine learning method called CORDER (COmmunity Relation Discovery by named Entity Recognition) to turn these unstructured data into structured information for knowledge management in these organizations. CORDER exploits named entity recognition and co-occurrence data to associate individuals in an organization with their expertise and associates. We discuss the problems associated with evaluating unsupervised learners and report our initial evaluation experiments in an expert evaluation, a quantitative benchmarking, and an application of CORDER in a social networking tool called BuddyFinder.
Resumo:
We introduce a flexible visual data mining framework which combines advanced projection algorithms from the machine learning domain and visual techniques developed in the information visualization domain. The advantage of such an interface is that the user is directly involved in the data mining process. We integrate principled projection algorithms, such as generative topographic mapping (GTM) and hierarchical GTM (HGTM), with powerful visual techniques, such as magnification factors, directional curvatures, parallel coordinates and billboarding, to provide a visual data mining framework. Results on a real-life chemoinformatics dataset using GTM are promising and have been analytically compared with the results from the traditional projection methods. It is also shown that the HGTM algorithm provides additional value for large datasets. The computational complexity of these algorithms is discussed to demonstrate their suitability for the visual data mining framework. Copyright 2006 ACM.
Resumo:
Purpose – On 29 January 2001, Euronext LIFFE introduced single security futures contracts on a range of global companies. The purpose of this paper is to examine the impact that the introduction of these futures contracts had on the behaviour of opening and closing UK equity returns. Design/methodology/approach – The paper models the price discovery process using the Amihud and Mendelson partial adjustment model which can be estimated using a Kalman filter. Findings – Empirical results show that during the pre-futures period both opening and closing returns under-react to new information. After the introduction of futures contracts opening returns over-react. A rise in the partial adjustment coefficient also takes place for closing returns but this is not large enough to cause over-reaction. Originality/value – This is the first study to examine the impact of a single security futures contract on the speed of spot market price discovery.
Resumo:
The evaluation of geospatial data quality and trustworthiness presents a major challenge to geospatial data users when making a dataset selection decision. The research presented here therefore focused on defining and developing a GEO label – a decision support mechanism to assist data users in efficient and effective geospatial dataset selection on the basis of quality, trustworthiness and fitness for use. This thesis thus presents six phases of research and development conducted to: (a) identify the informational aspects upon which users rely when assessing geospatial dataset quality and trustworthiness; (2) elicit initial user views on the GEO label role in supporting dataset comparison and selection; (3) evaluate prototype label visualisations; (4) develop a Web service to support GEO label generation; (5) develop a prototype GEO label-based dataset discovery and intercomparison decision support tool; and (6) evaluate the prototype tool in a controlled human-subject study. The results of the studies revealed, and subsequently confirmed, eight geospatial data informational aspects that were considered important by users when evaluating geospatial dataset quality and trustworthiness, namely: producer information, producer comments, lineage information, compliance with standards, quantitative quality information, user feedback, expert reviews, and citations information. Following an iterative user-centred design (UCD) approach, it was established that the GEO label should visually summarise availability and allow interrogation of these key informational aspects. A Web service was developed to support generation of dynamic GEO label representations and integrated into a number of real-world GIS applications. The service was also utilised in the development of the GEO LINC tool – a GEO label-based dataset discovery and intercomparison decision support tool. The results of the final evaluation study indicated that (a) the GEO label effectively communicates the availability of dataset quality and trustworthiness information and (b) GEO LINC successfully facilitates ‘at a glance’ dataset intercomparison and fitness for purpose-based dataset selection.
Resumo:
This thesis is a qualitative case study that draws upon a grounded genre analysis approach situated within the social constructivist paradigm. The study describes the various obligatory, desired, and optional moves used by post-graduate students as they interacted within an online, non-judgmental environment in order to seek solutions to issues they were experiencing with their research projects or teaching. The postgraduate students or case participants met individually online with me at pre-arranged times to take part in Instant Messenger Cooperative Development (IMCD) (Boon, 2005) 30-minute to one hour sessions via the text-chat function of Skype. Participants took on the role of ‘Explorer’ in order to articulate their thoughts and ideas about their research. I took on the role of ‘Understander’ to provide support to each Explorer by reflecting my understanding of the ongoing articulations as the Explorers investigated their specific issues, determined possible ways to overcome them, made new discoveries, and formulated plans of action regarding the best way for them to move forward. The description of generic moves covers 32 IMCD sessions collected over a threeyear period (2009-2012) from 10 different participants (A-J). Data collected is drawn from live IMCD sessions, field notes, and post-session email feedback from participants. In particular, the thesis focuses on describing the specific generic moves of Explorers within IMCD sessions as they seek satisfactory resolutions to particular research or pedagogic puzzles. It also provides a detailed description of a longitudinal case (Participant A – four sessions), a one-session case (Participant B – one session), and an outlier case in which the Explorer underwent a negative IMCD experience. The thesis concludes by arguing that IMCD is a highly effective tool that helps facilitate the research process for both distance-learning and on-campus students and has the potential to be utilized across all disciplines at the tertiary level.