889 resultados para heterogeneous data sources


Relevância:

90.00% 90.00%

Publicador:

Resumo:

ENGLISH: EASTROPIC Expedition was a cooperative oceanographic study of the eastern tropical Pacific Ocean conducted during the period 2 October through 16 December 1955. The five participating agencies and the ships they operated were: Scripps Institution of Oceanography (SIO), Spencer F. Baird and Horizon; Pacific Oceanic Fisheries Investigations (POFI) of the U. S. Fish and Wildlife Service, now Honolulu Biological Laboratory (HBL) of the U. S. Bureau of Commercial Fisheries, Hugh M. Smith; California Department of Fish and Game, N. B. Scofield; the Peruvian Navy, Bondu; and the Inter-American Tropical Tuna Commission which operated no vessels but supplied equipment and personnel. In addition to these planned participations in EASTROPIC Expedition, valuable information was provided by CCOFI Cruise 5512 of the California Cooperative Oceanic Fisheries Investigations, conducted during the period 29 November -16 December 1955 with the two vessels Stranger and Black Douglas. While the observational programs of most of the agencies involved, in part, special hydrographic-biological studies of known features and processes in the region (see reports listed under Data Sources) the deployment of ships and therefore of observations was sufficient that EASTROPIC Expedition could be considered a survey of the eastern tropical Pacific. This report is concerned with that aspect of the Expedition and is a presentation in atlas form of most of the hydrographic data collected. For reasons given below, emphasis has been placed on the upper 300 m of the water column. SPANISH: La Expedición EASTROPIC es un estudio oceanográfico cooperativo del Océano Pacífico Oriental Tropical llevado a cabo durante el período del 2 de octubre al 16 de dícíembre de 1955. Las cinco agencias participantes y los barcos operados por ellas son los siguientes: Scrípps Instítutíon of Oceanography (SIO) , Spencer F. Baird y Horizon; Pacific Oceanic Fisheries Investigatíons (PO'FI) del U. S. Fish and Wildlife Service, ahora Honolulu Biological Laboratory (BHL) del U. S. Bureau of Commercial Fisheries, Hugh M. Smith; California Department of Fish and Game, N. B. Scofield; la Marina Peruana, Bondu; y la Comisión Interamericana del Atún Tropical que no dirigió ningún barco pero proporcionó equipo y personal. Además de estas participaciones planeadas en la Expedición EASTROPIC, fué suministrada información de valor por el Crucero CCOFI 5512 del California Cooperative Fisheries Investigatíons, llevado a cabo durante el período del 29 de noviembre al 16 de diciembre de 1955 con los barcos Stranger y Black Douglas. Aunque los programas de observación de la mayoría de las agencias, comprendieron en parte estudios especiales hidrográficos y biológicos de las características y de los procesos conocidos de la región (véase los informes indicados bajo Fuente de Datos), el despliegue de los barcos, y por lo tanto, de las observaciones, fué suficiente para que la Expedición EASTROPIC pudiera ser considerada como una encuesta del Pacífico Oriental Tropical. Este informe se refiere a este aspecto de la Expedición y es una presentación, en forma de un atlas, de la mayoría de los datos hidrográficos recolectados. Por las razones que se dan a continuación, se le dió énfasis a los 300 m., superiores de la columna de agua. (PDF contains 136 pages.)

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The primary objective of this project, “the Assessment of Existing Information on Atlantic Coastal Fish Habitat”, is to inform conservation planning for the Atlantic Coastal Fish Habitat Partnership (ACFHP). ACFHP is recognized as a Partnership by the National Fish Habitat Action Plan (NFHAP), whose overall mission is to protect, restore, and enhance the nation’s fish and aquatic communities through partnerships that foster fish habitat conservation. This project is a cooperative effort of NOAA/NOS Center for Coastal Monitoring and Assessment (CCMA) Biogeography Branch and ACFHP. The Assessment includes three components; 1. a representative bibliographic and assessment database, 2. a Geographical Information System (GIS) spatial framework, and 3. a summary document with description of methods, analyses of habitat assessment information, and recommendations for further work. The spatial bibliography was created by linking the bibliographic table developed in Microsoft Excel and exported to SQL Server, with the spatial framework developed in ArcGIS and exported to GoogleMaps. The bibliography is a comprehensive, searchable database of over 500 selected documents and data sources on Atlantic coastal fish species and habitats. Key information captured for each entry includes basic bibliographic data, spatial footprint (e.g. waterbody or watershed), species and habitats covered, and electronic availability. Information on habitat condition indicators, threats, and conservation recommendations are extracted from each entry and recorded in a separate linked table. The spatial framework is a functional digital map based on polygon layers of watersheds, estuarine and marine waterbodies derived from NOAA’s Coastal Assessment Framework, MMS/NOAA’s Multipurpose Marine Cadastre, and other sources, providing spatial reference for all of the documents cited in the bibliography. Together, the bibliography and assessment tables and their spatial framework provide a powerful tool to query and assess available information through a publicly available web interface. They were designed to support the development of priorities for ACFHP’s conservation efforts within a geographic area extending from Maine to Florida, and from coastal watersheds seaward to the edge of the continental shelf. The Atlantic Coastal Fish Habitat Partnership has made initial use of the Assessment of Existing Information. Though it has not yet applied the AEI in a systematic or structured manner, it expects to find further uses as the draft conservation strategic plan is refined, and as regional action plans are developed. It also provides a means to move beyond an “assessment of existing information” towards an “assessment of fish habitat”, and is being applied towards the National Fish Habitat Action Plan (NFHAP) 2010 Assessment. Beyond the scope of the current project, there may be application to broader initiatives such as Integrated Ecosystem Assessments (IEAs), Ecosystem Based Management (EBM), and Marine Spatial Planning (MSP).

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In the Climate Change Act of 2008 the UK Government pledged to reduce carbon emissions by 80% by 2050. As one step towards this, regulations are being introduced requiring all new buildings to be ‘zero carbon’ by 2019. These are defined as buildings which emit net zero carbon during their operational lifetime. However, in order to meet the 80% target it is necessary to reduce the carbon emitted during the whole life-cycle of buildings, including that emitted during the processes of construction. These elements make up the ‘embodied carbon’ of the building. While there are no regulations yet in place to restrict embodied carbon, a number of different approaches have been made. There are several existing databases of embodied carbon and embodied energy. Most provide data for the material extraction and manufacturing only, the ‘cradle to factory gate’ phase. In addition to the databases, various software tools have been developed to calculate embodied energy and carbon of individual buildings. A third source of data comes from the research literature, in which individual life cycle analyses of buildings are reported. This paper provides a comprehensive review, comparing and assessing data sources, boundaries and methodologies. The paper concludes that the wide variations in these aspects produce incomparable results. It highlights the areas where existing data is reliable, and where new data and more precise methods are needed. This comprehensive review will guide the future development of a consistent and transparent database and software tool to calculate the embodied energy and carbon of buildings.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Compared with structured data sources that are usually stored and analyzed in spreadsheets, relational databases, and single data tables, unstructured construction data sources such as text documents, site images, web pages, and project schedules have been less intensively studied due to additional challenges in data preparation, representation, and analysis. In this paper, our vision for data management and mining addressing such challenges are presented, together with related research results from previous work, as well as our recent developments of data mining on text-based, web-based, image-based, and network-based construction databases.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

In the Climate Change Act of 2008 the UK Government pledged to reduce carbon emissions by 80% by 2050. As one step towards this, regulations are being introduced requiring all new buildings to be ‘zero carbon’ by 2019. These are defined as buildingswhichemitnetzerocarbonduringtheiroperationallifetime.However,inordertomeetthe80%targetitisnecessary to reduce the carbon emitted during the whole life-cycle of buildings, including that emitted during the processes of construction. These elements make up the ‘embodied carbon’ of the building. While there are no regulations yet in place to restrictembodiedcarbon,anumberofdifferentapproacheshavebeenmade.Thereareseveralexistingdatabasesofembodied carbonandembodiedenergy.Mostprovidedataforthematerialextractionandmanufacturingonly,the‘cradletofactorygate’ phase. In addition to the databases, various software tools have been developed to calculate embodied energy and carbon of individual buildings. A third source of data comes from the research literature, in which individual life cycle analyses of buildings are reported. This paper provides a comprehensive review, comparing and assessing data sources, boundaries and methodologies. The paper concludes that the wide variations in these aspects produce incomparable results. It highlights the areas where existing data is reliable, and where new data and more precise methods are needed. This comprehensive review will guide the future development of a consistent and transparent database and software tool to calculate the embodied energy and carbon of buildings.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Transcriptional regulation has been studied intensively in recent decades. One important aspect of this regulation is the interaction between regulatory proteins, such as transcription factors (TF) and nucleosomes, and the genome. Different high-throughput techniques have been invented to map these interactions genome-wide, including ChIP-based methods (ChIP-chip, ChIP-seq, etc.), nuclease digestion methods (DNase-seq, MNase-seq, etc.), and others. However, a single experimental technique often only provides partial and noisy information about the whole picture of protein-DNA interactions. Therefore, the overarching goal of this dissertation is to provide computational developments for jointly modeling different experimental datasets to achieve a holistic inference on the protein-DNA interaction landscape.

We first present a computational framework that can incorporate the protein binding information in MNase-seq data into a thermodynamic model of protein-DNA interaction. We use a correlation-based objective function to model the MNase-seq data and a Markov chain Monte Carlo method to maximize the function. Our results show that the inferred protein-DNA interaction landscape is concordant with the MNase-seq data and provides a mechanistic explanation for the experimentally collected MNase-seq fragments. Our framework is flexible and can easily incorporate other data sources. To demonstrate this flexibility, we use prior distributions to integrate experimentally measured protein concentrations.

We also study the ability of DNase-seq data to position nucleosomes. Traditionally, DNase-seq has only been widely used to identify DNase hypersensitive sites, which tend to be open chromatin regulatory regions devoid of nucleosomes. We reveal for the first time that DNase-seq datasets also contain substantial information about nucleosome translational positioning, and that existing DNase-seq data can be used to infer nucleosome positions with high accuracy. We develop a Bayes-factor-based nucleosome scoring method to position nucleosomes using DNase-seq data. Our approach utilizes several effective strategies to extract nucleosome positioning signals from the noisy DNase-seq data, including jointly modeling data points across the nucleosome body and explicitly modeling the quadratic and oscillatory DNase I digestion pattern on nucleosomes. We show that our DNase-seq-based nucleosome map is highly consistent with previous high-resolution maps. We also show that the oscillatory DNase I digestion pattern is useful in revealing the nucleosome rotational context around TF binding sites.

Finally, we present a state-space model (SSM) for jointly modeling different kinds of genomic data to provide an accurate view of the protein-DNA interaction landscape. We also provide an efficient expectation-maximization algorithm to learn model parameters from data. We first show in simulation studies that the SSM can effectively recover underlying true protein binding configurations. We then apply the SSM to model real genomic data (both DNase-seq and MNase-seq data). Through incrementally increasing the types of genomic data in the SSM, we show that different data types can contribute complementary information for the inference of protein binding landscape and that the most accurate inference comes from modeling all available datasets.

This dissertation provides a foundation for future research by taking a step toward the genome-wide inference of protein-DNA interaction landscape through data integration.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

BACKGROUND: The National Comprehensive Cancer Network and the American Society of Clinical Oncology have established guidelines for the treatment and surveillance of colorectal cancer (CRC), respectively. Considering these guidelines, an accurate and efficient method is needed to measure receipt of care. METHODS: The accuracy and completeness of Veterans Health Administration (VA) administrative data were assessed by comparing them with data manually abstracted during the Colorectal Cancer Care Collaborative (C4) quality improvement initiative for 618 patients with stage I-III CRC. RESULTS: The VA administrative data contained gender, marital, and birth information for all patients but race information was missing for 62.1% of patients. The percent agreement for demographic variables ranged from 98.1-100%. The kappa statistic for receipt of treatments ranged from 0.21 to 0.60 and there was a 96.9% agreement for the date of surgical resection. The percentage of post-diagnosis surveillance events in C4 also in VA administrative data were 76.0% for colonoscopy, 84.6% for physician visit, and 26.3% for carcinoembryonic antigen (CEA) test. CONCLUSIONS: VA administrative data are accurate and complete for non-race demographic variables, receipt of CRC treatment, colonoscopy, and physician visits; but alternative data sources may be necessary to capture patient race and receipt of CEA tests.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

This short position paper considers issues in developing Data Architecture for the Internet of Things (IoT) through the medium of an exemplar project, Domain Expertise Capture in Authoring and Development ­Environments (DECADE). A brief discussion sets the background for IoT, and the development of the ­distinction between things and computers. The paper makes a strong argument to avoid reinvention of the wheel, and to reuse approaches to distributed heterogeneous data architectures and the lessons learned from that work, and apply them to this situation. DECADE requires an autonomous recording system, ­local data storage, semi-autonomous verification model, sign-off mechanism, qualitative and ­quantitative ­analysis ­carried out when and where required through web-service architecture, based on ontology and analytic agents, with a self-maintaining ontology model. To develop this, we describe a web-service ­architecture, ­combining a distributed data warehouse, web services for analysis agents, ontology agents and a ­verification engine, with a centrally verified outcome database maintained by certifying body for qualification/­professional status.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

To date, the processing of wildlife location data has relied on a diversity of software and file formats. Data management and the following spatial and statistical analyses were undertaken in multiple steps, involving many time-consuming importing/exporting phases. Recent technological advancements in tracking systems have made large, continuous, high-frequency datasets of wildlife behavioral data available, such as those derived from the global positioning system (GPS) and other animal-attached sensor devices. These data can be further complemented by a wide range of other information about the animals’ environment. Management of these large and diverse datasets for modelling animal behaviour and ecology can prove challenging, slowing down analysis and increasing the probability of mistakes in data handling. We address these issues by critically evaluating the requirements for good management of GPS data for wildlife biology. We highlight that dedicated data management tools and expertise are needed. We explore current research in wildlife data management. We suggest a general direction of development, based on a modular software architecture with a spatial database at its core, where interoperability, data model design and integration with remote-sensing data sources play an important role in successful GPS data handling.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Objective: Several surveillance definitions of influenza-like illness (ILI) have been proposed, based on the presence of symptoms. Symptom data can be obtained from patients, medical records, or both. Past research has found that agreements between health record data and self-report are variable depending on the specific symptom. Therefore, we aimed to explore the implications of using data on influenza symptoms extracted from medical records, similar data collected prospectively from outpatients, and the combined data from both sources as predictors of laboratory-confirmed influenza. Methods: Using data from the Hutterite Influenza Prevention Study, we calculated: 1) the sensitivity, specificity and predictive values of individual symptoms within surveillance definitions; 2) how frequently surveillance definitions correlated to laboratory-confirmed influenza; and 3) the predictive value of surveillance definitions. Results: Of the 176 participants with reports from participants and medical records, 142 (81%) were tested for influenza and 37 (26%) were PCR positive for influenza. Fever (alone) and fever combined with cough and/or sore throat were highly correlated with being PCR positive for influenza for all data sources. ILI surveillance definitions, based on symptom data from medical records only or from both medical records and self-report, were better predictors of laboratory-confirmed influenza with higher odds ratios and positive predictive values. Discussion: The choice of data source to determine ILI will depend on the patient population, outcome of interest, availability of data source, and use for clinical decision making, research, or surveillance. © Canadian Public Health Association, 2012.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Objectives: To assess whether open angle glaucoma (OAG) screening meets the UK National Screening Committee criteria, to compare screening strategies with case finding, to estimate test parameters, to model estimates of cost and cost-effectiveness, and to identify areas for future research. Data sources: Major electronic databases were searched up to December 2005. Review methods: Screening strategies were developed by wide consultation. Markov submodels were developed to represent screening strategies. Parameter estimates were determined by systematic reviews of epidemiology, economic evaluations of screening, and effectiveness (test accuracy, screening and treatment). Tailored highly sensitive electronic searches were undertaken. Results: Most potential screening tests reviewed had an estimated specificity of 85% or higher. No test was clearly most accurate, with only a few, heterogeneous studies for each test. No randomised controlled trials (RCTs) of screening were identified. Based on two treatment RCTs, early treatment reduces the risk of progression. Extrapolating from this, and assuming accelerated progression with advancing disease severity, without treatment the mean time to blindness in at least one eye was approximately 23 years, compared to 35 years with treatment. Prevalence would have to be about 3-4% in 40 year olds with a screening interval of 10 years to approach cost-effectiveness. It is predicted that screening might be cost-effective in a 50-year-old cohort at a prevalence of 4% with a 10-year screening interval. General population screening at any age, thus, appears not to be cost-effective. Selective screening of groups with higher prevalence (family history, black ethnicity) might be worthwhile, although this would only cover 6% of the population. Extension to include other at-risk cohorts (e.g. myopia and diabetes) would include 37% of the general population, but the prevalence is then too low for screening to be considered cost-effective. Screening using a test with initial automated classification followed by assessment by a specialised optometrist, for test positives, was more cost-effective than initial specialised optometric assessment. The cost-effectiveness of the screening programme was highly sensitive to the perspective on costs (NHS or societal). In the base-case model, the NHS costs of visual impairment were estimated as £669. If annual societal costs were £8800, then screening might be considered cost-effective for a 40-year-old cohort with 1% OAG prevalence assuming a willingness to pay of £30,000 per quality-adjusted life-year. Of lesser importance were changes to estimates of attendance for sight tests, incidence of OAG, rate of progression and utility values for each stage of OAG severity. Cost-effectiveness was not particularly sensitive to the accuracy of screening tests within the ranges observed. However, a highly specific test is required to reduce large numbers of false-positive referrals. The findings that population screening is unlikely to be cost-effective are based on an economic model whose parameter estimates have considerable uncertainty, in particular, if rate of progression and/or costs of visual impairment are higher than estimated then screening could be cost-effective. Conclusions: While population screening is not cost-effective, the targeted screening of high-risk groups may be. Procedures for identifying those at risk, for quality assuring the programme, as well as adequate service provision for those screened positive would all be needed. Glaucoma detection can be improved by increasing attendance for eye examination, and improving the performance of current testing by either refining practice or adding in a technology-based first assessment, the latter being the more cost-effective option. This has implications for any future organisational changes in community eye-care services. Further research should aim to develop and provide quality data to populate the economic model, by conducting a feasibility study of interventions to improve detection, by obtaining further data on costs of blindness, risk of progression and health outcomes, and by conducting an RCT of interventions to improve the uptake of glaucoma testing. © Queen's Printer and Controller of HMSO 2007. All rights reserved.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Master data management (MDM) integrates data from multiple
structured data sources and builds a consolidated 360-
degree view of business entities such as customers and products.
Today’s MDM systems are not prepared to integrate
information from unstructured data sources, such as news
reports, emails, call-center transcripts, and chat logs. However,
those unstructured data sources may contain valuable
information about the same entities known to MDM from
the structured data sources. Integrating information from
unstructured data into MDM is challenging as textual references
to existing MDM entities are often incomplete and
imprecise and the additional entity information extracted
from text should not impact the trustworthiness of MDM
data.
In this paper, we present an architecture for making MDM
text-aware and showcase its implementation as IBM InfoSphere
MDM Extension for Unstructured Text Correlation,
an add-on to IBM InfoSphere Master Data Management
Standard Edition. We highlight how MDM benefits from
additional evidence found in documents when doing entity
resolution and relationship discovery. We experimentally
demonstrate the feasibility of integrating information from
unstructured data sources into MDM.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

AIM: The aim of this study was to explore the concepts of 'resilience' and 'hardiness' in nursing and midwifery students in educational settings and to identify educational interventions to promote resilience.

BACKGROUND: Resilience in healthcare professionals has gained increasing attention globally, yet to date resilience and resilience education in nursing and midwifery students remain largely under-researched.

DESIGN: An integrative literature review was planned, however, only quantitative evidence was identified therefore, a review of quantitative studies was undertaken using a systematic approach.

DATA SOURCES: A comprehensive search was undertaken using Medline, CINAHL, Embase, PsycINFO and Maternity and Infant Care databases January 1980-February 2015.

REVIEW METHODS: Data were extracted using a specifically designed form and quality assessed using an appropriate checklist. A narrative summary of findings and statistical outcomes was undertaken.

RESULTS: Eight quantitative studies were included. Research relating to resilience and resilience education in nursing and midwifery students is sparse. There is a weak evidence that resilience and hardiness is associated with slightly improved academic performance and decreased burnout. However, studies were heterogeneous in design and limited by poor methodological quality. No study specifically considered student midwives.

CONCLUSION: A greater understanding of the theoretical underpinnings of resilience in nursing and midwifery students is essential for the development of educational resources. It is imperative that future research considers both nursing and midwifery training cohorts and should be of strong methodological quality.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

As tecnologias de informação e comunicação na área da saúde não são só um instrumento para a boa gestão de informação, mas antes um fator estratégico para uma prestação de cuidados mais eficiente e segura. As tecnologias de informação são um pilar para que os sistemas de saúde evoluam em direção a um modelo centrado no cidadão, no qual um conjunto abrangente de informação do doente deve estar automaticamente disponível para as equipas que lhe prestam cuidados, independentemente de onde foi gerada (local geográfico ou sistema). Este tipo de utilização segura e agregada da informação clínica é posta em causa pela fragmentação generalizada das implementações de sistemas de informação em saúde. Várias aproximações têm sido propostas para colmatar as limitações decorrentes das chamadas “ilhas de informação” na saúde, desde a centralização total (um sistema único), à utilização de redes descentralizadas de troca de mensagens clínicas. Neste trabalho, propomos a utilização de uma camada de unificação baseada em serviços, através da federação de fontes de informação heterogéneas. Este agregador de informação clínica fornece a base necessária para desenvolver aplicações com uma lógica regional, que demostrámos com a implementação de um sistema de registo de saúde eletrónico virtual. Ao contrário dos métodos baseados em mensagens clínicas ponto-a-ponto, populares na integração de sistemas em saúde, desenvolvemos um middleware segundo os padrões de arquitetura J2EE, no qual a informação federada é expressa como um modelo de objetos, acessível através de interfaces de programação. A arquitetura proposta foi instanciada na Rede Telemática de Saúde, uma plataforma instalada na região de Aveiro que liga oito instituições parceiras (dois hospitais e seis centros de saúde), cobrindo ~350.000 cidadãos, utilizada por ~350 profissionais registados e que permite acesso a mais de 19.000.000 de episódios. Para além da plataforma colaborativa regional para a saúde (RTSys), introduzimos uma segunda linha de investigação, procurando fazer a ponte entre as redes para a prestação de cuidados e as redes para a computação científica. Neste segundo cenário, propomos a utilização dos modelos de computação Grid para viabilizar a utilização e integração massiva de informação biomédica. A arquitetura proposta (não implementada) permite o acesso a infraestruturas de e-Ciência existentes para criar repositórios de informação clínica para aplicações em saúde.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The rapid evolution and proliferation of a world-wide computerized network, the Internet, resulted in an overwhelming and constantly growing amount of publicly available data and information, a fact that was also verified in biomedicine. However, the lack of structure of textual data inhibits its direct processing by computational solutions. Information extraction is the task of text mining that intends to automatically collect information from unstructured text data sources. The goal of the work described in this thesis was to build innovative solutions for biomedical information extraction from scientific literature, through the development of simple software artifacts for developers and biocurators, delivering more accurate, usable and faster results. We started by tackling named entity recognition - a crucial initial task - with the development of Gimli, a machine-learning-based solution that follows an incremental approach to optimize extracted linguistic characteristics for each concept type. Afterwards, Totum was built to harmonize concept names provided by heterogeneous systems, delivering a robust solution with improved performance results. Such approach takes advantage of heterogenous corpora to deliver cross-corpus harmonization that is not constrained to specific characteristics. Since previous solutions do not provide links to knowledge bases, Neji was built to streamline the development of complex and custom solutions for biomedical concept name recognition and normalization. This was achieved through a modular and flexible framework focused on speed and performance, integrating a large amount of processing modules optimized for the biomedical domain. To offer on-demand heterogenous biomedical concept identification, we developed BeCAS, a web application, service and widget. We also tackled relation mining by developing TrigNER, a machine-learning-based solution for biomedical event trigger recognition, which applies an automatic algorithm to obtain the best linguistic features and model parameters for each event type. Finally, in order to assist biocurators, Egas was developed to support rapid, interactive and real-time collaborative curation of biomedical documents, through manual and automatic in-line annotation of concepts and relations. Overall, the research work presented in this thesis contributed to a more accurate update of current biomedical knowledge bases, towards improved hypothesis generation and knowledge discovery.