The last decades have been characterized by a continuous adoption of IT solutions in the healthcare sector, which resulted in the proliferation of tremendous amounts of data over heterogeneous systems. Distinct data types are currently generated, manipulated, and stored, in the several institutions where patients are treated. The data sharing and an integrated access to this information will allow extracting relevant knowledge that can lead to better diagnostics and treatments. This thesis proposes new integration models for gathering information and extracting knowledge from multiple and heterogeneous biomedical sources. The scenario complexity led us to split the integration problem according to the data type and to the usage specificity. The first contribution is a cloud-based architecture for exchanging medical imaging services. It offers a simplified registration mechanism for providers and services, promotes remote data access, and facilitates the integration of distributed data sources. Moreover, it is compliant with international standards, ensuring the platform interoperability with current medical imaging devices. The second proposal is a sensor-based architecture for integration of electronic health records. It follows a federated integration model and aims to provide a scalable solution to search and retrieve data from multiple information systems. The last contribution is an open architecture for gathering patient-level data from disperse and heterogeneous databases. All the proposed solutions were deployed and validated in real world use cases.


Background: Traditionally communicable diseases were the main causes of burden in developing countries like Nepal. In recent years non-communicable diseases (NCDs), mainly cardiovascular diseases (CVDs), cancer, chronic respiratory diseases and diabetes mellitus, impose a larger disease burden compared to communicable diseases. Most elements of health and medicine policies in Nepal are still focused on communicable diseases. There is limited evidence about NCDs and NCD medicines in Nepal. Aim: To explore the gap between the burden of NCDs and the availability and affordability of NCD medicines in Nepal. Methods: Biomedical databases like Medline, Scopus, Web of Science and other online sources (including Global Burden of Diseases data) were searched for data on the burden of NCDs in term of Disability Adjusted Life Years (DALYs). The Essential Medicines List (EML) of Nepal was compared with World Health Organisation (EML) for inclusion of NCD medicines. Results: In Nepal, NCDs caused nearly 45% of the total 10.5 million DALYs in 2010. CVDs (15.2%), were the leading cause of NCDs burden followed by chronic respiratory diseases (14.7%), cancer (7.3%) and diabetes mellitus (3.2%). One hospital based national survey found that 37% of hospitalised patients had NCDs. Among them, 38% had heart disease followed by COPD (33%) , and diabetes (10%). Most (23 out of 28) non-cancer NCD medicines recommended in WHO-EML were present in Nepal's EML, theoretically indicating good availability. However, it is difficult to say whether they are accessible and affordable due to the lack of adequate data on access and pricing. Conclusion: This study gives some insight into the burden of NCDs. Although NCD medicines are available in Nepal, further research is required to determine whether they are accessible and affordable to the general population.


Objective High utilisation of emergency department (ED) among the elderly is of worldwide concern. This study aims to review the effectiveness of interventions targeting the elderly population in reducing ED utilisation. Methods Major biomedical databases were searched for relevant studies. Qualitative approach was applied to derive common themes in the myriad interventions and to critically assess the variations influencing interventions’ effectiveness. Quality of studies was appraised using the Effective Public Health Practice Project (EPPHP) tool. Results 36 studies were included. Nine of 16 community-based interventions reported significant reductions in ED utilisation. Five of 20 hospital-based interventions proved effective while another four demonstrated failure. Seven key elements were identified. Ten of 14 interventions associated with significant reduction on ED use integrated at least three of the seven elements. All four interventions with significant negative results lacked five or more of the seven elements. Some key elements including multidisciplinary team, integrated primary care and social care often existed in effective interventions, while were absent in all significantly ineffective ones. Conclusions The investigated interventions have mixed effectiveness. Our findings suggest the hospital-based interventions have relatively poorer effects, and should be better connected to the community-based strategies. Interventions seem to achieve the most success with integration of multi-layered elements, especially when incorporating key elements such as a nurse-led multidisciplinary team, integrated social care, and strong linkages to the longer-term primary and community care. Notwithstanding limitations in generalising the findings, this review builds on the growing body of evidence in this particular area.


BACKGROUND Chikungunya and dengue infections are spatio-temporally related. The current review aims to determine the geographic limits of chikungunya, dengue and the principal mosquito vectors for both viruses and to synthesise current epidemiological understanding of their co-distribution. METHODS Three biomedical databases (PubMed, Scopus and Web of Science) were searched from their inception until May 2015 for studies that reported concurrent detection of chikungunya and dengue viruses in the same patient. Additionally, data from WHO, CDC and Healthmap alerts were extracted to create up-to-date global distribution maps for both dengue and chikungunya. RESULTS Evidence for chikungunya-dengue co-infection has been found in Angola, Gabon, India, Madagascar, Malaysia, Myanmar, Nigeria, Saint Martin, Singapore, Sri Lanka, Tanzania, Thailand and Yemen; these constitute only 13 out of the 98 countries/territories where both chikungunya and dengue epidemic/endemic transmission have been reported. CONCLUSIONS Understanding the true extent of chikungunya-dengue co-infection is hampered by current diagnosis largely based on their similar symptoms. Heightened awareness of chikungunya among the public and public health practitioners in the advent of the ongoing outbreak in the Americas can be expected to improve diagnostic rigour. Maps generated from the newly compiled lists of the geographic distribution of both pathogens and vectors represent the current geographical limits of chikungunya and dengue, as well as the countries/territories at risk of future incursion by both viruses. These describe regions of co-endemicity in which lab-based diagnosis of suspected cases is of higher priority.


O projecto de sequenciação do genoma humano veio abrir caminho para o surgimento de novas áreas transdisciplinares de investigação, como a biologia computacional, a bioinformática e a bioestatística. Um dos resultados emergentes desde advento foi a tecnologia de DNA microarrays, que permite o estudo do perfil da expressão de milhares de genes, quando sujeitos a perturbações externas. Apesar de ser uma tecnologia relativamente consolidada, continua a apresentar um conjunto vasto de desafios, nomeadamente do ponto de vista computacional e dos sistemas de informação. São exemplos a optimização dos procedimentos de tratamento de dados bem como o desenvolvimento de metodologias de interpretação semi-automática dos resultados. O principal objectivo deste trabalho consistiu em explorar novas soluções técnicas para agilizar os procedimentos de armazenamento, partilha e análise de dados de experiências de microarrays. Com esta finalidade, realizou-se uma análise de requisitos associados às principais etapas da execução de uma experiência, tendo sido identificados os principais défices, propostas estratégias de melhoramento e apresentadas novas soluções. Ao nível da gestão de dados laboratoriais, é proposto um LIMS (Laboratory Information Management System) que possibilita a gestão de todos os dados gerados e dos procedimentos realizados. Este sistema integra ainda uma solução que permite a partilha de experiências, de forma a promover a participação colaborativa de vários investigadores num mesmo projecto, mesmo usando LIMS distintos. No contexto da análise de dados, é apresentado um modelo que facilita a integração de algoritmos de processamento e de análise de experiências no sistema desenvolvido. Por fim, é proposta uma solução para facilitar a interpretação biológica de um conjunto de genes diferencialmente expressos, através de ferramentas que integram informação existente em diversas bases de dados biomédicas.


In lucid dreams the dreamer is aware of dreaming and often able to influence the ongoing dream content. Lucid dreaming is a learnable skill and a variety of techniques is suggested for lucid dreaming induction. This systematic review evaluated the evidence for the effectiveness of induction techniques. A comprehensive literature search was carried out in biomedical databases and specific resources. Thirty-five studies were included in the analysis (11 sleep laboratory and 24 field studies), of which 26 employed cognitive techniques, 11 external stimulation and one drug application. The methodological quality of the included studies was relatively low. None of the induction techniques were verified to induce lucid dreams reliably and consistently, although some of them look promising. On the basis of the reviewed studies, a taxonomy of lucid dream induction methods is presented. Several methodological issues are discussed and further directions for future studies are proposed.


Text classification is essential for narrowing down the number of documents relevant to a particular topic for further pursual, especially when searching through large biomedical databases. Protein-protein interactions are an example of such a topic with databases being devoted specifically to them. This paper proposed a semi-supervised learning algorithm via local learning with class priors (LL-CP) for biomedical text classification where unlabeled data points are classified in a vector space based on their proximity to labeled nodes. The algorithm has been evaluated on a corpus of biomedical documents to identify abstracts containing information about protein-protein interactions with promising results. Experimental results show that LL-CP outperforms the traditional semisupervised learning algorithms such as SVMand it also performs better than local learning without incorporating class priors.


Objective: The study aimed to examine the difference in response rates between opt-out and opt-in participant recruitment in a population-based study of heavy-vehicle drivers involved in a police-attended crash. Methods: Two approaches to subject recruitment were implemented in two different states over a 14-week period and response rates for the two approaches (opt-out versus opt-in recruitment) were compared. Results: Based on the eligible and contactable drivers, the response rates were 54% for the optout group and 16% for the opt-in group. Conclusions and Implications: The opt-in recruitment strategy (which was a consequence of one jurisdiction’s interpretation of the national Privacy Act at the time) resulted in an insufficient and potentially biased sample for the purposes of conducting research into risk factors for heavy-vehicle crashes. Australia’s national Privacy Act 1988 has had a long history of inconsistent practices by state and territory government departments and ethical review committees. These inconsistencies can have profound effects on the validity of research, as shown through the significantly different response rates we reported in this study. It is hoped that a more unified interpretation of the Privacy Act across the states and territories, as proposed under the soon-to-be released Australian Privacy Principles will reduce the recruitment challenges outlined in this study.


Die Molekularbiologie von Menschen ist ein hochkomplexes und vielfältiges Themengebiet, in dem in vielen Bereichen geforscht wird. Der Fokus liegt hier insbesondere auf den Bereichen der Genomik, Proteomik, Transkriptomik und Metabolomik, und Jahre der Forschung haben große Mengen an wertvollen Daten zusammengetragen. Diese Ansammlung wächst stetig und auch für die Zukunft ist keine Stagnation absehbar. Mittlerweile aber hat diese permanente Informationsflut wertvolles Wissen in unüberschaubaren, digitalen Datenbergen begraben und das Sammeln von forschungsspezifischen und zuverlässigen Informationen zu einer großen Herausforderung werden lassen. Die in dieser Dissertation präsentierte Arbeit hat ein umfassendes Kompendium von humanen Geweben für biomedizinische Analysen generiert. Es trägt den Namen medicalgenomics.org und hat diverse biomedizinische Probleme auf der Suche nach spezifischem Wissen in zahlreichen Datenbanken gelöst. Das Kompendium ist das erste seiner Art und sein gewonnenes Wissen wird Wissenschaftlern helfen, einen besseren systematischen Überblick über spezifische Gene oder funktionaler Profile, mit Sicht auf Regulation sowie pathologische und physiologische Bedingungen, zu bekommen. Darüber hinaus ermöglichen verschiedene Abfragemethoden eine effiziente Analyse von signalgebenden Ereignissen, metabolischen Stoffwechselwegen sowie das Studieren der Gene auf der Expressionsebene. Die gesamte Vielfalt dieser Abfrageoptionen ermöglicht den Wissenschaftlern hoch spezialisierte, genetische Straßenkarten zu erstellen, mit deren Hilfe zukünftige Experimente genauer geplant werden können. Infolgedessen können wertvolle Ressourcen und Zeit eingespart werden, bei steigenden Erfolgsaussichten. Des Weiteren kann das umfassende Wissen des Kompendiums genutzt werden, um biomedizinische Hypothesen zu generieren und zu überprüfen.


Over the last years, and particularly in the context of the COMBIOMED network, our biomedical informatics (BMI) group at the Universidad Politecnica de Madrid has carried out several approaches to address a fundamental issue: to facilitate open access and retrieval to BMI resources —including software, databases and services. In this regard, we have followed various directions: a) a text mining-based approach to automatically build a “resourceome”, an inventory of open resources, b) methods for heterogeneous database integration —including clinical, -omics and nanoinformatics sources—; c) creating various services to provide access to different resources to African users and professionals, and d) an approach to facilitate access to open resources from research projects


Context: Because positive biomedical observations are more often published than those reporting no effect, initial observations are often refuted or attenuated by subsequent studies. Objective: To determine whether newspapers preferentially report on initial findings and whether they also report on subsequent studies. Methods: We focused on attention deficit hyperactivity disorder (ADHD). Using Factiva and PubMed databases, we identified 47 scientific publications on ADHD published in the 1990s and soon echoed by 347 newspapers articles. We selected the ten most echoed publications and collected all their relevant subsequent studies until 2011. We checked whether findings reported in each ‘‘top 10’’ publication were consistent with previous and subsequent observations. We also compared the newspaper coverage of the ‘‘top 10’’ publications to that of their related scientific studies. Results: Seven of the ‘‘top 10’’ publications were initial studies and the conclusions in six of them were either refuted or strongly attenuated subsequently. The seventh was not confirmed or refuted, but its main conclusion appears unlikely. Among the three ‘‘top 10’’ that were not initial studies, two were confirmed subsequently and the third was attenuated. The newspaper coverage of the ‘‘top 10’’ publications (223 articles) was much larger than that of the 67 related studies (57 articles). Moreover, only one of the latter newspaper articles reported that the corresponding ‘‘top 10’’ finding had been attenuated. The average impact factor of the scientific journals publishing studies echoed by newspapers (17.1 n = 56) was higher (p,0.0001) than that corresponding to related publications that were not echoed (6.4 n = 56). Conclusion: Because newspapers preferentially echo initial ADHD findings appearing in prominent journals, they report on uncertain findings that are often refuted or attenuated by subsequent studies. If this media reporting bias generalizes to health sciences, it represents a major cause of distortion in health science communication.


The overwhelming amount and unprecedented speed of publication in the biomedical domain make it difficult for life science researchers to acquire and maintain a broad view of the field and gather all information that would be relevant for their research. As a response to this problem, the BioNLP (Biomedical Natural Language Processing) community of researches has emerged and strives to assist life science researchers by developing modern natural language processing (NLP), information extraction (IE) and information retrieval (IR) methods that can be applied at large-scale, to scan the whole publicly available biomedical literature and extract and aggregate the information found within, while automatically normalizing the variability of natural language statements. Among different tasks, biomedical event extraction has received much attention within BioNLP community recently. Biomedical event extraction constitutes the identification of biological processes and interactions described in biomedical literature, and their representation as a set of recursive event structures. The 2009–2013 series of BioNLP Shared Tasks on Event Extraction have given raise to a number of event extraction systems, several of which have been applied at a large scale (the full set of PubMed abstracts and PubMed Central Open Access full text articles), leading to creation of massive biomedical event databases, each of which containing millions of events. Sinece top-ranking event extraction systems are based on machine-learning approach and are trained on the narrow-domain, carefully selected Shared Task training data, their performance drops when being faced with the topically highly varied PubMed and PubMed Central documents. Specifically, false-positive predictions by these systems lead to generation of incorrect biomolecular events which are spotted by the end-users. This thesis proposes a novel post-processing approach, utilizing a combination of supervised and unsupervised learning techniques, that can automatically identify and filter out a considerable proportion of incorrect events from large-scale event databases, thus increasing the general credibility of those databases. The second part of this thesis is dedicated to a system we developed for hypothesis generation from large-scale event databases, which is able to discover novel biomolecular interactions among genes/gene-products. We cast the hypothesis generation problem as a supervised network topology prediction, i.e predicting new edges in the network, as well as types and directions for these edges, utilizing a set of features that can be extracted from large biomedical event networks. Routine machine learning evaluation results, as well as manual evaluation results suggest that the problem is indeed learnable. This work won the Best Paper Award in The 5th International Symposium on Languages in Biology and Medicine (LBM 2013).