848 resultados para databases and data mining
Resumo:
Presentations sponsored by the Patent and Trademark Depository Library Association (PTDLA) at the American Library Association Annual Conference, New Orleans, June 25, 2006 Speaker #1: Nan Myers Associate Professor; Government Documents, Patents and Trademarks Librarian Wichita State University, Wichita, KS Title: Intellectual Property Roundup: Copyright, Trademarks, Trade Secrets, and Patents Abstract: This presentation provides a capsule overview of the distinctive coverage of the four types of intellectual property – What they are, why they are important, how to get them, what they cost, how long they last. Emphasis will be on what questions patrons ask most, along with the answers! Includes coverage of the mission of Patent & Trademark Depository Libraries (PTDLs) and other sources of business information outside of libraries, such as Small Business Development Centers. Speaker #2: Jan Comfort Government Information Reference Librarian Clemson University, Clemson, SC Title: Patents as a Source of Competitive Intelligence Information Abstract: Large corporations often have R&D departments, or large numbers of staff whose jobs are to monitor the activities of their competitors. This presentation will review strategies that small business owners can employ to do their own competitive intelligence analysis. The focus will be on features of the patent database that is available free of charge on the USPTO website, as well as commercial databases available at many public and academic libraries across the country. Speaker #3: Virginia Baldwin Professor; Engineering Librarian University of Nebraska-Lincoln, Lincoln, NE Title: Mining Online Patent Data for Business Information Abstract: The United States Patent and Trademark Office (USPTO) website and websites of international databases contains information about granted patents and patent applications and the technologies they represent. Statistical information about patents, their technologies, geographical information, and patenting entities are compiled and available as reports on the USPTO website. Other valuable information from these websites can be obtained using data mining techniques. This presentation will provide the keys to opening these resources and obtaining valuable data. Speaker #4: Donna Hopkins Engineering Librarian Renssalaer Polytechnic Institute, Troy, NY Title: Searching the USPTO Trademark Database for Wordmarks and Logos Abstract: This presentation provides an overview of wordmark searching in www.uspto.gov, followed by a review of the techniques of searching for non-word US trademarks using codes from the Design Search Code Manual. These codes are used in an electronic search, either on the uspto website or on CASSIS DVDs. The search is sometimes supplemented by consulting the Official Gazette. A specific example of using a section of the codes for searching is included. Similar searches on the Madrid Express database of WIPO, using the Vienna Classification, will also be briefly described.
Resumo:
Background The use of the knowledge produced by sciences to promote human health is the main goal of translational medicine. To make it feasible we need computational methods to handle the large amount of information that arises from bench to bedside and to deal with its heterogeneity. A computational challenge that must be faced is to promote the integration of clinical, socio-demographic and biological data. In this effort, ontologies play an essential role as a powerful artifact for knowledge representation. Chado is a modular ontology-oriented database model that gained popularity due to its robustness and flexibility as a generic platform to store biological data; however it lacks supporting representation of clinical and socio-demographic information. Results We have implemented an extension of Chado – the Clinical Module - to allow the representation of this kind of information. Our approach consists of a framework for data integration through the use of a common reference ontology. The design of this framework has four levels: data level, to store the data; semantic level, to integrate and standardize the data by the use of ontologies; application level, to manage clinical databases, ontologies and data integration process; and web interface level, to allow interaction between the user and the system. The clinical module was built based on the Entity-Attribute-Value (EAV) model. We also proposed a methodology to migrate data from legacy clinical databases to the integrative framework. A Chado instance was initialized using a relational database management system. The Clinical Module was implemented and the framework was loaded using data from a factual clinical research database. Clinical and demographic data as well as biomaterial data were obtained from patients with tumors of head and neck. We implemented the IPTrans tool that is a complete environment for data migration, which comprises: the construction of a model to describe the legacy clinical data, based on an ontology; the Extraction, Transformation and Load (ETL) process to extract the data from the source clinical database and load it in the Clinical Module of Chado; the development of a web tool and a Bridge Layer to adapt the web tool to Chado, as well as other applications. Conclusions Open-source computational solutions currently available for translational science does not have a model to represent biomolecular information and also are not integrated with the existing bioinformatics tools. On the other hand, existing genomic data models do not represent clinical patient data. A framework was developed to support translational research by integrating biomolecular information coming from different “omics” technologies with patient’s clinical and socio-demographic data. This framework should present some features: flexibility, compression and robustness. The experiments accomplished from a use case demonstrated that the proposed system meets requirements of flexibility and robustness, leading to the desired integration. The Clinical Module can be accessed in http://dcm.ffclrp.usp.br/caib/pg=iptrans webcite.
Resumo:
OBJECTIVE: To describe the electronic medical databases used in antiretroviral therapy (ART) programmes in lower-income countries and assess the measures such programmes employ to maintain and improve data quality and reduce the loss of patients to follow-up. METHODS: In 15 countries of Africa, South America and Asia, a survey was conducted from December 2006 to February 2007 on the use of electronic medical record systems in ART programmes. Patients enrolled in the sites at the time of the survey but not seen during the previous 12 months were considered lost to follow-up. The quality of the data was assessed by computing the percentage of missing key variables (age, sex, clinical stage of HIV infection, CD4+ lymphocyte count and year of ART initiation). Associations between site characteristics (such as number of staff members dedicated to data management), measures to reduce loss to follow-up (such as the presence of staff dedicated to tracing patients) and data quality and loss to follow-up were analysed using multivariate logit models. FINDINGS: Twenty-one sites that together provided ART to 50 060 patients were included (median number of patients per site: 1000; interquartile range, IQR: 72-19 320). Eighteen sites (86%) used an electronic database for medical record-keeping; 15 (83%) such sites relied on software intended for personal or small business use. The median percentage of missing data for key variables per site was 10.9% (IQR: 2.0-18.9%) and declined with training in data management (odds ratio, OR: 0.58; 95% confidence interval, CI: 0.37-0.90) and weekly hours spent by a clerk on the database per 100 patients on ART (OR: 0.95; 95% CI: 0.90-0.99). About 10 weekly hours per 100 patients on ART were required to reduce missing data for key variables to below 10%. The median percentage of patients lost to follow-up 1 year after starting ART was 8.5% (IQR: 4.2-19.7%). Strategies to reduce loss to follow-up included outreach teams, community-based organizations and checking death registry data. Implementation of all three strategies substantially reduced losses to follow-up (OR: 0.17; 95% CI: 0.15-0.20). CONCLUSION: The quality of the data collected and the retention of patients in ART treatment programmes are unsatisfactory for many sites involved in the scale-up of ART in resource-limited settings, mainly because of insufficient staff trained to manage data and trace patients lost to follow-up.
Resumo:
Pteropods are a group of holoplanktonic gastropods for which global biomass distribution patterns remain poorly resolved. The aim of this study was to collect and synthesize existing pteropod (Gymnosomata, Thecosomata and Pseudothecosomata) abundance and biomass data, in order to evaluate the global distribution of pteropod carbon biomass, with a particular emphasis on its seasonal, temporal and vertical patterns. We collected 25 902 data points from several online databases and a number of scientific articles. The biomass data has been gridded onto a 360 x 180° grid, with a vertical resolution of 33 WOA depth levels. Data has been converted to NetCDF format. Data were collected between 1951-2010, with sampling depths ranging from 0-1000 m. Pteropod biomass data was either extracted directly or derived through converting abundance to biomass with pteropod specific length to weight conversions. In the Northern Hemisphere (NH) the data were distributed evenly throughout the year, whereas sampling in the Southern Hemisphere was biased towards the austral summer months. 86% of all biomass values were located in the NH, most (42%) within the latitudinal band of 30-50° N. The range of global biomass values spanned over three orders of magnitude, with a mean and median biomass concentration of 8.2 mg C l-1 (SD = 61.4) and 0.25 mg C l-1, respectively for all data points, and with a mean of 9.1 mg C l-1 (SD = 64.8) and a median of 0.25 mg C l-1 for non-zero biomass values. The highest mean and median biomass concentrations were located in the NH between 40-50° S (mean biomass: 68.8 mg C l-1 (SD = 213.4) median biomass: 2.5 mg C l-1) while, in the SH, they were within the 70-80° S latitudinal band (mean: 10.5 mg C l-1 (SD = 38.8) and median: 0.2 mg C l-1). Biomass values were lowest in the equatorial regions. A broad range of biomass concentrations was observed at all depths, with the biomass peak located in the surface layer (0-25 m) and values generally decreasing with depth. However, biomass peaks were located at different depths in different ocean basins: 0-25 m depth in the N Atlantic, 50-100 m in the Pacific, 100-200 m in the Arctic, 200-500 m in the Brazilian region and >500 m in the Indo-Pacific region. Biomass in the NH was relatively invariant over the seasonal cycle, but more seasonally variable in the SH. The collected database provides a valuable tool for modellers for the study of ecosystem processes and global biogeochemical cycles.
Resumo:
This paper presents the results of a Secchi depth data mining study for the North Sea - Baltic Sea region. 40,829 measurements of Secchi depth were compiled from the area as a result of this study. 4.3% of the observations were found in the international data centers [ICES Oceanographic Data Center in Denmark and the World Ocean Data Center A (WDC-A) in the USA], while 95.7% of the data was provided by individuals and ocean research institutions from the surrounding North Sea and Baltic Sea countries. Inquiries made at the World Ocean Data Center B (WDC-B) in Russia suggested that there could be significant additional holdings in that archive but, unfortunately, no data could be made available. The earliest Secchi depth measurement retrieved in this study dates back to 1902 for the Baltic Sea, while the bulk of the measurements were gathered after 1970. The spatial distribution of Secchi depth measurements in the North Sea is very uneven with surprisingly large sampling gaps in the Western North Sea. Quarterly and annual Secchi depth maps with a 0.5° x 0.5° spatial resolution are provided for the transition area between the North Sea and the Baltic Sea (4°E-16°E, 53°N-60°N).
Resumo:
The HIV Reverse Transcriptase and Protease Sequence Database is an on-line relational database that catalogs evolutionary and drug-related sequence variation in the human immunodeficiency virus (HIV) reverse transcriptase (RT) and protease enzymes, the molecular targets of anti-HIV therapy (http://hivdb.stanford.edu). The database contains a compilation of nearly all published HIV RT and protease sequences, including submissions from International Collaboration databases and sequences published in journal articles. Sequences are linked to data about the source of the sequence sample and the antiretroviral drug treatment history of the individual from whom the isolate was obtained. During the past year 3500 sequences have been added and the data model has been expanded to include drug susceptibility data on sequenced isolates. Database content has also been integrated with didactic text and the output of two sequence analysis programs.