9 resultados para Text Mining
em Duke University
Resumo:
The application of semantic technologies to the integration of biological data and the interoperability of bioinformatics analysis and visualization tools has been the common theme of a series of annual BioHackathons hosted in Japan for the past five years. Here we provide a review of the activities and outcomes from the BioHackathons held in 2011 in Kyoto and 2012 in Toyama. In order to efficiently implement semantic technologies in the life sciences, participants formed various sub-groups and worked on the following topics: Resource Description Framework (RDF) models for specific domains, text mining of the literature, ontology development, essential metadata for biological databases, platforms to enable efficient Semantic Web technology development and interoperability, and the development of applications for Semantic Web data. In this review, we briefly introduce the themes covered by these sub-groups. The observations made, conclusions drawn, and software development projects that emerged from these activities are discussed.
Resumo:
BACKGROUND: In recent years large bibliographic databases have made much of the published literature of biology available for searches. However, the capabilities of the search engines integrated into these databases for text-based bibliographic searches are limited. To enable searches that deliver the results expected by comparative anatomists, an underlying logical structure known as an ontology is required. DEVELOPMENT AND TESTING OF THE ONTOLOGY: Here we present the Mammalian Feeding Muscle Ontology (MFMO), a multi-species ontology focused on anatomical structures that participate in feeding and other oral/pharyngeal behaviors. A unique feature of the MFMO is that a simple, computable, definition of each muscle, which includes its attachments and innervation, is true across mammals. This construction mirrors the logical foundation of comparative anatomy and permits searches using language familiar to biologists. Further, it provides a template for muscles that will be useful in extending any anatomy ontology. The MFMO is developed to support the Feeding Experiments End-User Database Project (FEED, https://feedexp.org/), a publicly-available, online repository for physiological data collected from in vivo studies of feeding (e.g., mastication, biting, swallowing) in mammals. Currently the MFMO is integrated into FEED and also into two literature-specific implementations of Textpresso, a text-mining system that facilitates powerful searches of a corpus of scientific publications. We evaluate the MFMO by asking questions that test the ability of the ontology to return appropriate answers (competency questions). We compare the results of queries of the MFMO to results from similar searches in PubMed and Google Scholar. RESULTS AND SIGNIFICANCE: Our tests demonstrate that the MFMO is competent to answer queries formed in the common language of comparative anatomy, but PubMed and Google Scholar are not. Overall, our results show that by incorporating anatomical ontologies into searches, an expanded and anatomically comprehensive set of results can be obtained. The broader scientific and publishing communities should consider taking up the challenge of semantically enabled search capabilities.
Resumo:
BACKGROUND: Over the past two decades more than fifty thousand unique clinical and biological samples have been assayed using the Affymetrix HG-U133 and HG-U95 GeneChip microarray platforms. This substantial repository has been used extensively to characterize changes in gene expression between biological samples, but has not been previously mined en masse for changes in mRNA processing. We explored the possibility of using HG-U133 microarray data to identify changes in alternative mRNA processing in several available archival datasets. RESULTS: Data from these and other gene expression microarrays can now be mined for changes in transcript isoform abundance using a program described here, SplicerAV. Using in vivo and in vitro breast cancer microarray datasets, SplicerAV was able to perform both gene and isoform specific expression profiling within the same microarray dataset. Our reanalysis of Affymetrix U133 plus 2.0 data generated by in vitro over-expression of HRAS, E2F3, beta-catenin (CTNNB1), SRC, and MYC identified several hundred oncogene-induced mRNA isoform changes, one of which recognized a previously unknown mechanism of EGFR family activation. Using clinical data, SplicerAV predicted 241 isoform changes between low and high grade breast tumors; with changes enriched among genes coding for guanyl-nucleotide exchange factors, metalloprotease inhibitors, and mRNA processing factors. Isoform changes in 15 genes were associated with aggressive cancer across the three breast cancer datasets. CONCLUSIONS: Using SplicerAV, we identified several hundred previously uncharacterized isoform changes induced by in vitro oncogene over-expression and revealed a previously unknown mechanism of EGFR activation in human mammary epithelial cells. We analyzed Affymetrix GeneChip data from over 400 human breast tumors in three independent studies, making this the largest clinical dataset analyzed for en masse changes in alternative mRNA processing. The capacity to detect RNA isoform changes in archival microarray data using SplicerAV allowed us to carry out the first analysis of isoform specific mRNA changes directly associated with cancer survival.
Resumo:
BACKGROUND: The ability to write clearly and effectively is of central importance to the scientific enterprise. Encouraged by the success of simulation environments in other biomedical sciences, we developed WriteSim TCExam, an open-source, Web-based, textual simulation environment for teaching effective writing techniques to novice researchers. We shortlisted and modified an existing open source application - TCExam to serve as a textual simulation environment. After testing usability internally in our team, we conducted formal field usability studies with novice researchers. These were followed by formal surveys with researchers fitting the role of administrators and users (novice researchers) RESULTS: The development process was guided by feedback from usability tests within our research team. Online surveys and formal studies, involving members of the Research on Research group and selected novice researchers, show that the application is user-friendly. Additionally it has been used to train 25 novice researchers in scientific writing to date and has generated encouraging results. CONCLUSION: WriteSim TCExam is the first Web-based, open-source textual simulation environment designed to complement traditional scientific writing instruction. While initial reviews by students and educators have been positive, a formal study is needed to measure its benefits in comparison to standard instructional methods.
Resumo:
Mountaintop mining (MTM) is the primary procedure for surface coal exploration within the central Appalachian region of the eastern United States, and it is known to contaminate streams in local watersheds. In this study, we measured the chemical and isotopic compositions of water samples from MTM-impacted tributaries and streams in the Mud River watershed in West Virginia. We systematically document the isotopic compositions of three major constituents: sulfur isotopes in sulfate (δ(34)SSO4), carbon isotopes in dissolved inorganic carbon (δ(13)CDIC), and strontium isotopes ((87)Sr/(86)Sr). The data show that δ(34)SSO4, δ(13)CDIC, Sr/Ca, and (87)Sr/(86)Sr measured in saline- and selenium-rich MTM impacted tributaries are distinguishable from those of the surface water upstream of mining impacts. These tracers can therefore be used to delineate and quantify the impact of MTM in watersheds. High Sr/Ca and low (87)Sr/(86)Sr characterize tributaries that originated from active MTM areas, while tributaries from reclaimed MTM areas had low Sr/Ca and high (87)Sr/(86)Sr. Leaching experiments of rocks from the watershed show that pyrite oxidation and carbonate dissolution control the solute chemistry with distinct (87)Sr/(86)Sr ratios characterizing different rock sources. We propose that MTM operations that access the deeper Kanawha Formation generate residual mined rocks in valley fills from which effluents with distinctive (87)Sr/(86)Sr and Sr/Ca imprints affect the quality of the Appalachian watersheds.
Resumo:
A tree-based dictionary learning model is developed for joint analysis of imagery and associated text. The dictionary learning may be applied directly to the imagery from patches, or to general feature vectors extracted from patches or superpixels (using any existing method for image feature extraction). Each image is associated with a path through the tree (from root to a leaf), and each of the multiple patches in a given image is associated with one node in that path. Nodes near the tree root are shared between multiple paths, representing image characteristics that are common among different types of images. Moving toward the leaves, nodes become specialized, representing details in image classes. If available, words (text) are also jointly modeled, with a path-dependent probability over words. The tree structure is inferred via a nested Dirichlet process, and a retrospective stick-breaking sampler is used to infer the tree depth and width.
Resumo:
Selenium (Se) is a micronutrient necessary for the function of a variety of important enzymes; Se also exhibits a narrow range in concentrations between essentiality and toxicity. Oviparous vertebrates such as birds and fish are especially sensitive to Se toxicity, which causes reproductive impairment and defects in embryo development. Selenium occurs naturally in the Earth's crust, but it can be mobilized by a variety of anthropogenic activities, including agricultural practices, coal burning, and mining.
Mountaintop removal/valley fill (MTR/VF) coal mining is a form of surface mining found throughout central Appalachia in the United States that involves blasting off the tops of mountains to access underlying coal seams. Spoil rock from the mountain is placed into adjacent valleys, forming valley fills, which bury stream headwaters and negatively impact surface water quality. This research focused on the biological impacts of Se leached from MTR/VF coal mining operations located around the Mud River, West Virginia.
In order to assess the status of Se in a lotic (flowing) system such as the Mud River, surface water, insects, and fish samples including creek chub (Semotilus atromaculatus) and green sunfish (Lepomis cyanellus) were collected from a mining impacted site as well as from a reference site not impacted by mining. Analysis of samples from the mined site showed increased conductivity and Se in the surface waters compared to the reference site in addition to increased concentrations of Se in insects and fish. Histological analysis of mined site fish gills showed a lack of normal parasites, suggesting parasite populations may be disrupted due to poor water quality. X-ray absorption near edge spectroscopy techniques were used to determine the speciation of Se in insect and creek chub samples. Insects contained approximately 40-50% inorganic Se (selenate and selenite) and 50-60% organic Se (Se-methionine and Se-cystine) while fish tissues contained lower proportions of inorganic Se than insects, instead having higher proportions of organic Se in the forms of methyl-Se-cysteine, Se-cystine, and Se-methionine.
Otoliths, calcified inner ear structures, were also collected from Mud River creek chubs and green sunfish and analyzed for Se content using laser ablation inductively couple mass spectrometry (LA-ICP-MS). Significant differences were found between the two species of fish, based on the concentrations of otolith Se. Green sunfish otoliths from all sites contained background or low concentrations of otolith Se (< 1 µg/g) that were not significantly different between mined and unmined sites. In contrast creek chub otoliths from the historically mined site contained much higher (≥ 5 µg/g, up to approximately 68 µg/g) concentrations of Se than for the same species in the unmined site or for the green sunfish. Otolith Se concentrations were related to muscle Se concentrations for creek chubs (R2 = 0.54, p = 0.0002 for the last 20% of the otolith Se versus muscle Se) while no relationship was observed for green sunfish.
Additional experiments using biofilms grown in the Mud River showed increased Se in mined site biofilms compared to the reference site. When we fed fathead minnows (Pimephales promelas) on these biofilms in the laboratory they accumulated higher concentrations of Se in liver and ovary tissues compared to fathead minnows fed on reference site biofilms. No differences in Se accumulation were found in muscle from either treatment group. Biofilms were also centrifuged and separated into filamentous green algae and the remaining diatom fraction. The majority of Se was found in the diatom fraction with only about 1/3rd of total biofilm Se concentration present in the filamentous green algae fraction
Finally, zebrafish (Danio rerio) embryos were exposed to aqueous Se in the form of selenate, selenite, and L-selenomethionine in an attempt to determine if oxidative stress plays a role in selenium embryo toxicity. Selenate and selenite exposure did not induce embryo deformities (lordosis and craniofacial malformation). L-selenomethionine, however, induced significantly higher deformity rates at 100 µg/L compared to controls. Antioxidant rescue of L-selenomethionime induced deformities was attempted in embryos using N-acetylcysteine (NAC). Pretreatment with NAC significantly reduced deformities in the zebrafish embryos secondarily treated with L-selenomethionine, suggesting that oxidative stress may play a role in Se toxicity. Selenite exposure also induced a 6.6-fold increase in glutathione-S-transferase pi class 2 gene expression, which is involved in xenobiotic transformation. No changes in gene expression were observed for selenate or L-selenomethionine-exposed embryos.
The findings in this dissertation contribute to the understanding of how Se bioaccumulates in a lotic system and is transferred through a simulated foodweb in addition to further exploring oxidative stress as a potential mechanism for Se-induced embryo toxicity. Future studies should continue to pursue the role of oxidative stress and other mechanisms in Se toxicity and the biotransformation of Se in aquatic ecosystems.
Resumo:
Many factors such as poverty, ineffective institutions and environmental regulations may prevent developing countries from managing how natural resources are extracted to meet a strong market demand. Extraction for some resources has reached such proportions that evidence is measurable from space. We present recent evidence of the global demand for a single commodity and the ecosystem destruction resulting from commodity extraction, recorded by satellites for one of the most biodiverse areas of the world. We find that since 2003, recent mining deforestation in Madre de Dios, Peru is increasing nonlinearly alongside a constant annual rate of increase in international gold price (∼18%/yr). We detect that the new pattern of mining deforestation (1915 ha/year, 2006-2009) is outpacing that of nearby settlement deforestation. We show that gold price is linked with exponential increases in Peruvian national mercury imports over time (R(2) = 0.93, p = 0.04, 2003-2009). Given the past rates of increase we predict that mercury imports may more than double for 2011 (∼500 t/year). Virtually all of Peru's mercury imports are used in artisanal gold mining. Much of the mining increase is unregulated/artisanal in nature, lacking environmental impact analysis or miner education. As a result, large quantities of mercury are being released into the atmosphere, sediments and waterways. Other developing countries endowed with gold deposits are likely experiencing similar environmental destruction in response to recent record high gold prices. The increasing availability of satellite imagery ought to evoke further studies linking economic variables with land use and cover changes on the ground.
Resumo:
This is the second installment of a three-part project to publish a group of ten Ptolemaic papyri purchased by Yale’s Beinecke Library in 1998 (acquisition “1998b”), which came to the Beinecke as three hard wads that were apparently the stuffing from the stomach cavity of a mummified animal. This article publishes: (1) P.CtYBR inv. 5019, a fragment of line ends in iambic tetrameter catalectic meter from an unknown comedy; the format suggests that this is a further example of certain type of Ptolemaic writing exercise. (2) P.CtYBR inv. 5043, a fragmentary grammatical text of uncertain import.