838 resultados para text and data mining
Resumo:
Advanced neuroinformatics tools are required for methods of connectome mapping, analysis, and visualization. The inherent multi-modality of connectome datasets poses new challenges for data organization, integration, and sharing. We have designed and implemented the Connectome Viewer Toolkit - a set of free and extensible open source neuroimaging tools written in Python. The key components of the toolkit are as follows: (1) The Connectome File Format is an XML-based container format to standardize multi-modal data integration and structured metadata annotation. (2) The Connectome File Format Library enables management and sharing of connectome files. (3) The Connectome Viewer is an integrated research and development environment for visualization and analysis of multi-modal connectome data. The Connectome Viewer's plugin architecture supports extensions with network analysis packages and an interactive scripting shell, to enable easy development and community contributions. Integration with tools from the scientific Python community allows the leveraging of numerous existing libraries for powerful connectome data mining, exploration, and comparison. We demonstrate the applicability of the Connectome Viewer Toolkit using Diffusion MRI datasets processed by the Connectome Mapper. The Connectome Viewer Toolkit is available from http://www.cmtk.org/
Resumo:
Drug safety issues pose serious health threats to the population and constitute a major cause of mortality worldwide. Due to the prominent implications to both public health and the pharmaceutical industry, it is of great importance to unravel the molecular mechanisms by which an adverse drug reaction can be potentially elicited. These mechanisms can be investigated by placing the pharmaco-epidemiologically detected adverse drug reaction in an information-rich context and by exploiting all currently available biomedical knowledge to substantiate it. We present a computational framework for the biological annotation of potential adverse drug reactions. First, the proposed framework investigates previous evidences on the drug-event association in the context of biomedical literature (signal filtering). Then, it seeks to provide a biological explanation (signal substantiation) by exploring mechanistic connections that might explain why a drug produces a specific adverse reaction. The mechanistic connections include the activity of the drug, related compounds and drug metabolites on protein targets, the association of protein targets to clinical events, and the annotation of proteins (both protein targets and proteins associated with clinical events) to biological pathways. Hence, the workflows for signal filtering and substantiation integrate modules for literature and database mining, in silico drug-target profiling, and analyses based on gene-disease networks and biological pathways. Application examples of these workflows carried out on selected cases of drug safety signals are discussed. The methodology and workflows presented offer a novel approach to explore the molecular mechanisms underlying adverse drug reactions
Resumo:
The objective of this study was to adapt a nonlinear model (Wang and Engel - WE) for simulating the phenology of maize (Zea mays L.), and to evaluate this model and a linear one (thermal time), in order to predict developmental stages of a field-grown maize variety. A field experiment, during 2005/2006 and 2006/2007 was conducted in Santa Maria, RS, Brazil, in two growing seasons, with seven sowing dates each. Dates of emergence, silking, and physiological maturity of the maize variety BRS Missões were recorded in six replications in each sowing date. Data collected in 2005/2006 growing season were used to estimate the coefficients of the two models, and data collected in the 2006/2007 growing season were used as independent data set for model evaluations. The nonlinear WE model accurately predicted the date of silking and physiological maturity, and had a lower root mean square error (RMSE) than the linear (thermal time) model. The overall RMSE for silking and physiological maturity was 2.7 and 4.8 days with WE model, and 5.6 and 8.3 days with thermal time model, respectively.
Resumo:
The objective of this work was to evaluate an estimation system for rice yield in Brazil, based on simple agrometeorological models and on the technological level of production systems. This estimation system incorporates the conceptual basis proposed by Doorenbos & Kassam for potential and attainable yields with empirical adjusts for maximum yield and crop sensitivity to water deficit, considering five categories of rice yield. Rice yield was estimated from 2000/2001 to 2007/2008, and compared to IBGE yield data. Regression analyses between model estimates and data from IBGE surveys resulted in significant coefficients of determination, with less dispersion in the South than in the North and Northeast regions of the country. Index of model efficiency (E1') ranged from 0.01 in the lower yield classes to 0.45 in higher ones, and mean absolute error ranged from 58 to 250 kg ha‑1, respectively.
Resumo:
BACKGROUND: Selective publication of studies, which is commonly called publication bias, is widely recognized. Over the years a new nomenclature for other types of bias related to non-publication or distortion related to the dissemination of research findings has been developed. However, several of these different biases are often still summarized by the term 'publication bias'. METHODS/DESIGN: As part of the OPEN Project (To Overcome failure to Publish nEgative fiNdings) we will conduct a systematic review with the following objectives:- To systematically review highly cited articles that focus on non-publication of studies and to present the various definitions of biases related to the dissemination of research findings contained in the articles identified.- To develop and discuss a new framework on nomenclature of various aspects of distortion in the dissemination process that leads to public availability of research findings in an international group of experts in the context of the OPEN Project.We will systematically search Web of Knowledge for highly cited articles that provide a definition of biases related to the dissemination of research findings. A specifically designed data extraction form will be developed and pilot-tested. Working in teams of two, we will independently extract relevant information from each eligible article.For the development of a new framework we will construct an initial table listing different levels and different hazards en route to making research findings public. An international group of experts will iteratively review the table and reflect on its content until no new insights emerge and consensus has been reached. DISCUSSION: Results are expected to be publicly available in mid-2013. This systematic review together with the results of other systematic reviews of the OPEN project will serve as a basis for the development of future policies and guidelines regarding the assessment and prevention of publication bias.
Resumo:
Kiihtyvä kilpailu yritysten välillä on tuonut yritykset vaikeidenhaasteiden eteen. Tuotteet pitäisi saada markkinoille nopeammin, uusien tuotteiden pitäisi olla parempia kuin vanhojen ja etenkin parempia kuin kilpailijoiden vastaavat tuotteet. Lisäksi tuotteiden suunnittelu-, valmistus- ja muut kustannukset eivät saisi olla suuria. Näiden haasteiden toteuttamisessa yritetään usein käyttää apuna tuotetietoja, niiden hallintaa ja vaihtamista. Andritzin, kuten muidenkin yritysten, on otettava nämä asiat huomioon pärjätäkseen kilpailussa. Tämä työ on tehty Andritzille, joka on maailman johtavia paperin ja sellun valmistukseen tarkoitettujen laitteiden valmistajia ja huoltopalveluiden tarjoajia. Andritz on ottamassa käyttöön ERP-järjestelmän kaikissa toimipisteissään. Sitä halutaan hyödyntää mahdollisimman tehokkaasti, joten myös tuotetiedot halutaan järjestelmään koko elinkaaren ajalta. Osan tuotetiedoista luo Andritzin kumppanit ja alihankkijat, joten myös tietojen vaihto partnereiden välillä halutaan hoitaasiten, että tiedot saadaan suoraan ERP-järjestelmään. Tämän työn tavoitteena onkin löytää ratkaisu, jonka avulla Andritzin ja sen kumppaneiden välinen tietojenvaihto voidaan hoitaa. Tämä diplomityö esittelee tuotetietojen, niiden hallinnan ja vaihtamisen tarkoituksen ja tärkeyden. Työssä esitellään erilaisia ratkaisuvaihtoehtoja tiedonvaihtojärjestelmän toteuttamiseksi. Osa niistä perustuu yleisiin ja toimialakohtaisiin standardeihin. Myös kaksi kaupallista tuotetta esitellään. Tarkasteltavana onseuraavat standardit: PaperIXI, papiNet, X-OSCO, PSK-standardit sekä RosettaNet. Lisäksi työssä tarkastellaan ERP-järjestelmän toimittajan, SAP:in ratkaisuja tietojenvaihtoon. Näistä vaihtoehdoista parhaimpia tarkastellaan vielä yksityiskohtaisemmin ja lopuksi eri ratkaisuja vertaillaan keskenään, jotta löydettäisiin Andritzin tarpeisiin paras vaihtoehto.
Resumo:
Tutkimuksen ensisijaisena tavoitteena oli tarkastella luottamuksen rakentumista virtuaalitiimissä. Keskeistä tarkastelussa olivat luottamuksen lähteiden löytäminen, suhteen rakentuminen sekä teknologiavälitteinen kommunikaatio. Myös käytännön keinoja ja sovelluksia etsittiin. Tässä tutkimuksessa luottamus nähtiin tärkeänä yhteistyön mahdollistajana sekä keskeisenä elementtinä ihmisten välisten suhteiden rakentumisessa. Tämä tutkimus oli empiirinen ja kuvaileva tapaustutkimus. Tutkimuksessa kvalitatiivista aineistoa kerättiin pääasiassa web-pohjaisen kyselyn sekä puhelinhaastattelun avulla. Aineistonkeruu toteutettiin siis pääasiassa virtuaalisesti. Saatu aineisto analysoitiin teemoittelun avulla. Tässä työssä teemoja etsittiin tekstistä pääasiassa teoriasta johdettujen oletusten perusteella. Tutkimuksen tuloksena oli, että luottamusta rakentavia mekanismeja ovat, karkeasti luokiteltuna, yhteiset päämäärät ja vastuut, kommunikaatio, sosiaalinen kanssakäyminen ja informaation jakaminen, toisten huomioiminen ja henkilökohtaiset ominaisuudet. Mekanismit eivät suuresti eronneet luottamuksen rakentumisen mekanismeista perinteisessä kontekstissa. Virtuaalitiimityön alkuvaiheessa luottamus pohjautui käsityksille toisten tiimin jäsenten kyvykkyydestä. Myös institutionaalinen identifioituminen loi pohjaa luottamukselle alkuvaiheessa. Muuten luottamus rakentui vähän kerrassaan tehtävään liittyvän kommunikaation ja sosiaalisen kommunikaation kautta. Tekojen merkitys korostui erityisesti ajan myötä. Työssä esitettiin myös käytännön keinoja luottamuksen rakentamiseksi. Olemassa olevien teknologioiden havaittiin tukevan hyvin suhteen rakentumista tiedon jakamiseen ja sen varastoimiseen liittyvissä tehtävissä. Sen sijaan vuorovaikutuksen näkökulmasta tuen ei nähty olevan yhtä kattavaa. Kaiken kaikkiaan kuitenkin parannuksella sosiaalisissa suhteissa voitaneen saada enemmän aikaan kuin parannuksilla teknologian suhteen.
Resumo:
Background: Information about the composition of regulatory regions is of great value for designing experiments to functionally characterize gene expression. The multiplicity of available applications to predict transcription factor binding sites in a particular locus contrasts with the substantial computational expertise that is demanded to manipulate them, which may constitute a potential barrier for the experimental community. Results: CBS (Conserved regulatory Binding Sites, http://compfly.bio.ub.es/CBS) is a public platform of evolutionarily conserved binding sites and enhancers predicted in multiple Drosophila genomes that is furnished with published chromatin signatures associated to transcriptionally active regions and other experimental sources of information. The rapid access to this novel body of knowledge through a user-friendly web interface enables non-expert users to identify the binding sequences available for any particular gene, transcription factor, or genome region. Conclusions: The CBS platform is a powerful resource that provides tools for data mining individual sequences and groups of co-expressed genes with epigenomics information to conduct regulatory screenings in Drosophila.
Resumo:
This paper reviews the history of Hg contamination in Brazil by characterizing and quantifying two major sources of Hg emissions to the environment: industrial sources and gold mining. Industry was responsible for nearly 100% of total Hg emissions from the late 1940's to the early 1970's, when efficient control policies were enforced, leading to a decrease in emissions. Gold mining, on the other hand was nearly insignificant as a Hg source up to the late 1970's, but presently is responsible for over 80% of total emissions. Presently, over 115 tons of Hg are released into the atmosphere in Brazil annually. Nearly 78 tons come from gold mining operations, 12 tons come from chlor-alkali industry and 25 tons come from all other industrial uses. Inputs to soils and waters however, are still unknown, due to lack of detailed data base. However, emissions from diffuse sources rather than well studied classical industrial sources are probably responsible for the major inputs of mercury to these compartments.
Resumo:
Recommender systems attempt to predict items in which a user might be interested, given some information about the user's and items' profiles. Most existing recommender systems use content-based or collaborative filtering methods or hybrid methods that combine both techniques (see the sidebar for more details). We created Informed Recommender to address the problem of using consumer opinion about products, expressed online in free-form text, to generate product recommendations. Informed recommender uses prioritized consumer product reviews to make recommendations. Using text-mining techniques, it maps each piece of each review comment automatically into an ontology
Resumo:
Fluent health information flow is critical for clinical decision-making. However, a considerable part of this information is free-form text and inabilities to utilize it create risks to patient safety and cost-effective hospital administration. Methods for automated processing of clinical text are emerging. The aim in this doctoral dissertation is to study machine learning and clinical text in order to support health information flow.First, by analyzing the content of authentic patient records, the aim is to specify clinical needs in order to guide the development of machine learning applications.The contributions are a model of the ideal information flow,a model of the problems and challenges in reality, and a road map for the technology development. Second, by developing applications for practical cases,the aim is to concretize ways to support health information flow. Altogether five machine learning applications for three practical cases are described: The first two applications are binary classification and regression related to the practical case of topic labeling and relevance ranking.The third and fourth application are supervised and unsupervised multi-class classification for the practical case of topic segmentation and labeling.These four applications are tested with Finnish intensive care patient records.The fifth application is multi-label classification for the practical task of diagnosis coding. It is tested with English radiology reports.The performance of all these applications is promising. Third, the aim is to study how the quality of machine learning applications can be reliably evaluated.The associations between performance evaluation measures and methods are addressed,and a new hold-out method is introduced.This method contributes not only to processing time but also to the evaluation diversity and quality. The main conclusion is that developing machine learning applications for text requires interdisciplinary, international collaboration. Practical cases are very different, and hence the development must begin from genuine user needs and domain expertise. The technological expertise must cover linguistics,machine learning, and information systems. Finally, the methods must be evaluated both statistically and through authentic user-feedback.
Resumo:
In Brazil, the State of Goiás is one of sugarcane expansion's frontiers to meet the growing demand for biofuels. The objective of this study was to identify the municipalities where there were replacement of annual crops (mainly grains) by sugarcane in the state of Goiás, as well as indicate correlations between the sugarcane expansion and the family farming production, in the period between 2005 and 2010. For this purpose, grains crop mask and sugarcane crop mask, obtained from satellite images, were intersected using geoprocessing techniques. It was also used IBGE data of sugarcane production and planted area, and data of family farming production linked with the National Food Acquisition Program (PAA), in relation to the number of cooperatives and family farmers. The crops masks and data tables of the National Food Acquisition Program were provided by National Food Supply Agency. There were 95 municipalities that had crops replacement, totaling 281,554 hectares of grains converted to sugarcane. We highlight the municipalities of Santa Isabel, Iaciara, Maurilândia, and Itapaci, where this change represented more than half of their agricultural areas. In relation to family farming, the sugarcane expansion in the state of Goiás has not affected their activities during the period studied.
Resumo:
ABSTRACT This study aimed to compare thematic maps of soybean yield for different sampling grids, using geostatistical methods (semivariance function and kriging). The analysis was performed with soybean yield data in t ha-1 in a commercial area with regular grids with distances between points of 25x25 m, 50x50 m, 75x75 m, 100x100 m, with 549, 188, 66 and 44 sampling points respectively; and data obtained by yield monitors. Optimized sampling schemes were also generated with the algorithm called Simulated Annealing, using maximization of the overall accuracy measure as a criterion for optimization. The results showed that sample size and sample density influenced the description of the spatial distribution of soybean yield. When the sample size was increased, there was an increased efficiency of thematic maps used to describe the spatial variability of soybean yield (higher values of accuracy indices and lower values for the sum of squared estimation error). In addition, more accurate maps were obtained, especially considering the optimized sample configurations with 188 and 549 sample points.
Resumo:
ABSTRACT Global warming increases the occurrence of events such as extreme heat waves. Research on thermal and air conditions affecting broiler-rearing environment are important to evaluate the animal welfare under extreme heat aiming mitigation measures. This study aimed at evaluating the effect of a simulated heat wave, in a climatic chamber, on the thermal and air environment of 42-day-old broilers. One hundred and sixty broilers were housed and reared for 42 days in a climatic chamber; the animals were divided into eight pens. Heat wave simulation was performed on the 42nd day, the period of great impact and data sampling. The analyzed variables were room and litter temperatures, relative humidity, concentrations of oxygen, carbon monoxide and ammonia at each pen. These variables were assessed each two hours, starting at 8 am, simulating a day heating up to 4 pm, when it is reached the maximum temperature. By the results, we concluded that increasing room temperatures promoted a proportional raise in litter temperatures, contributing to ammonia volatilization. In addition, oxygen concentrations decreased with increasing temperatures; and the carbon monoxide was only observed at temperatures above 27.0 °C, relative humidity higher than 88.4% and litter temperatures superior to 30.3 °C.
Resumo:
After decades of mergers and acquisitions and successive technology trends such as CRM, ERP and DW, the data in enterprise systems is scattered and inconsistent. Global organizations face the challenge of addressing local uses of shared business entities, such as customer and material, and at the same time have a consistent, unique, and consolidate view of financial indicators. In addition, current enterprise systems do not accommodate the pace of organizational changes and immense efforts are required to maintain data. When it comes to systems integration, ERPs are considered “closed” and expensive. Data structures are complex and the “out-of-the-box” integration options offered are not based on industry standards. Therefore expensive and time-consuming projects are undertaken in order to have required data flowing according to business processes needs. Master Data Management (MDM) emerges as one discipline focused on ensuring long-term data consistency. Presented as a technology-enabled business discipline, it emphasizes business process and governance to model and maintain the data related to key business entities. There are immense technical and organizational challenges to accomplish the “single version of the truth” MDM mantra. Adding one central repository of master data might prove unfeasible in a few scenarios, thus an incremental approach is recommended, starting from areas most critically affected by data issues. This research aims at understanding the current literature on MDM and contrasting it with views from professionals. The data collected from interviews revealed details on the complexities of data structures and data management practices in global organizations, reinforcing the call for more in-depth research on organizational aspects of MDM. The most difficult piece of master data to manage is the “local” part, the attributes related to the sourcing and storing of materials in one particular warehouse in The Netherlands or a complex set of pricing rules for a subsidiary of a customer in Brazil. From a practical perspective, this research evaluates one MDM solution under development at a Finnish IT solution-provider. By means of applying an existing assessment method, the research attempts at providing the company with one possible tool to evaluate its product from a vendor-agnostics perspective.