796 resultados para Data-Mining Techniques
Resumo:
Background: Since their inception, Twitter and related microblogging systems have provided a rich source of information for researchers and have attracted interest in their affordances and use. Since 2009 PubMed has included 123 journal articles on medicine and Twitter, but no overview exists as to how the field uses Twitter in research. // Objective: This paper aims to identify published work relating to Twitter indexed by PubMed, and then to classify it. This classification will provide a framework in which future researchers will be able to position their work, and to provide an understanding of the current reach of research using Twitter in medical disciplines. Limiting the study to papers indexed by PubMed ensures the work provides a reproducible benchmark. // Methods: Papers, indexed by PubMed, on Twitter and related topics were identified and reviewed. The papers were then qualitatively classified based on the paper’s title and abstract to determine their focus. The work that was Twitter focused was studied in detail to determine what data, if any, it was based on, and from this a categorization of the data set size used in the studies was developed. Using open coded content analysis additional important categories were also identified, relating to the primary methodology, domain and aspect. // Results: As of 2012, PubMed comprises more than 21 million citations from biomedical literature, and from these a corpus of 134 potentially Twitter related papers were identified, eleven of which were subsequently found not to be relevant. There were no papers prior to 2009 relating to microblogging, a term first used in 2006. Of the remaining 123 papers which mentioned Twitter, thirty were focussed on Twitter (the others referring to it tangentially). The early Twitter focussed papers introduced the topic and highlighted the potential, not carrying out any form of data analysis. The majority of published papers used analytic techniques to sort through thousands, if not millions, of individual tweets, often depending on automated tools to do so. Our analysis demonstrates that researchers are starting to use knowledge discovery methods and data mining techniques to understand vast quantities of tweets: the study of Twitter is becoming quantitative research. // Conclusions: This work is to the best of our knowledge the first overview study of medical related research based on Twitter and related microblogging. We have used five dimensions to categorise published medical related research on Twitter. This classification provides a framework within which researchers studying development and use of Twitter within medical related research, and those undertaking comparative studies of research relating to Twitter in the area of medicine and beyond, can position and ground their work.
Resumo:
Wikipedia is a free, web-based, collaborative, multilingual encyclopedia project supported by the non-profit Wikimedia Foundation. Due to the free nature of Wikipedia and allowing open access to everyone to edit articles the quality of articles may be affected. As all people don’t have equal level of knowledge and also different people have different opinions about a topic so there may be difference between the contributions made by different authors. To overcome this situation it is very important to classify the articles so that the articles of good quality can be separated from the poor quality articles and should be removed from the database. The aim of this study is to classify the articles of Wikipedia into two classes class 0 (poor quality) and class 1(good quality) using the Adaptive Neuro Fuzzy Inference System (ANFIS) and data mining techniques. Two ANFIS are built using the Fuzzy Logic Toolbox [1] available in Matlab. The first ANFIS is based on the rules obtained from J48 classifier in WEKA while the other one was built by using the expert’s knowledge. The data used for this research work contains 226 article’s records taken from the German version of Wikipedia. The dataset consists of 19 inputs and one output. The data was preprocessed to remove any similar attributes. The input variables are related to the editors, contributors, length of articles and the lifecycle of articles. In the end analysis of different methods implemented in this research is made to analyze the performance of each classification method used.
Predictive models for chronic renal disease using decision trees, naïve bayes and case-based methods
Resumo:
Data mining can be used in healthcare industry to “mine” clinical data to discover hidden information for intelligent and affective decision making. Discovery of hidden patterns and relationships often goes intact, yet advanced data mining techniques can be helpful as remedy to this scenario. This thesis mainly deals with Intelligent Prediction of Chronic Renal Disease (IPCRD). Data covers blood, urine test, and external symptoms applied to predict chronic renal disease. Data from the database is initially transformed to Weka (3.6) and Chi-Square method is used for features section. After normalizing data, three classifiers were applied and efficiency of output is evaluated. Mainly, three classifiers are analyzed: Decision Tree, Naïve Bayes, K-Nearest Neighbour algorithm. Results show that each technique has its unique strength in realizing the objectives of the defined mining goals. Efficiency of Decision Tree and KNN was almost same but Naïve Bayes proved a comparative edge over others. Further sensitivity and specificity tests are used as statistical measures to examine the performance of a binary classification. Sensitivity (also called recall rate in some fields) measures the proportion of actual positives which are correctly identified while Specificity measures the proportion of negatives which are correctly identified. CRISP-DM methodology is applied to build the mining models. It consists of six major phases: business understanding, data understanding, data preparation, modeling, evaluation, and deployment.
Resumo:
This article highlights the potential benefits that the Kohonen method has for the classification of rivers with similar characteristics by determining regional ecological flows using the ELOHA (Ecological Limits of Hydrologic Alteration) methodology. Currently, there are many methodologies for the classification of rivers, however none of them include the characteristics found in Kohonen method such as (i) providing the number of groups that actually underlie the information presented, (ii) used to make variable importance analysis, (iii) which in any case can display two-dimensional classification process, and (iv) that regardless of the parameters used in the model the clustering structure remains. In order to evaluate the potential benefits of the Kohonen method, 174 flow stations distributed along the great river basin “Magdalena-Cauca” (Colombia) were analyzed. 73 variables were obtained for the classification process in each case. Six trials were done using different combinations of variables and the results were validated against reference classification obtained by Ingfocol in 2010, whose results were also framed using ELOHA guidelines. In the process of validation it was found that two of the tested models reproduced a level higher than 80% of the reference classification with the first trial, meaning that more than 80% of the flow stations analyzed in both models formed invariant groups of streams.
Resumo:
An underwater gas pipeline is the portion of the pipeline that crosses a river beneath its bottom. Underwater gas pipelines are subject to increasing dangers as time goes by. An accident at an underwater gas pipeline can lead to technological and environmental disaster on the scale of an entire region. Therefore, timely troubleshooting of all underwater gas pipelines in order to prevent any potential accidents will remain a pressing task for the industry. The most important aspect of resolving this challenge is the quality of the automated system in question. Now the industry doesn't have any automated system that fully meets the needs of the experts working in the field maintaining underwater gas pipelines. Principle Aim of this Research: This work aims to develop a new system of automated monitoring which would simplify the process of evaluating the technical condition and decision making on planning and preventive maintenance and repair work on the underwater gas pipeline. Objectives: Creation a shared model for a new, automated system via IDEF3; Development of a new database system which would store all information about underwater gas pipelines; Development a new application that works with database servers, and provides an explanation of the results obtained from the server; Calculation of the values MTBF for specified pipelines based on quantitative data obtained from tests of this system. Conclusion: The new, automated system PodvodGazExpert has been developed for timely and qualitative determination of the physical conditions of underwater gas pipeline; The basis of the mathematical analysis of this new, automated system uses principal component analysis method; The process of determining the physical condition of an underwater gas pipeline with this new, automated system increases the MTBF by a factor of 8.18 above the existing system used today in the industry.
Resumo:
Pós-graduação em Desenvolvimento Humano e Tecnologias - IBRC
Resumo:
O atual modelo do setor elétrico brasileiro permite igualdade de condições a todos os agentes e reduz o papel do Estado no setor. Esse modelo obriga as empresas do setor a melhorarem cada vez mais a qualidade de seu produto e, como requisito para este objetivo, devem fazer uso mais efetivo da enorme quantidade de dados operacionais que são armazenados em bancos de dados, provenientes da operação dos seus sistemas elétricos e que tem nas Usinas Hidrelétricas (UHE) a sua principal fonte de geração de energia. Uma das principais ferramentas para gerenciamento dessas usinas são os sistemas de Supervisão, Controle e Aquisição de Dados (Supervisory Control And Data Acquisition - SCADA). Assim, a imensa quantidade de dados acumulados nos bancos de dados pelos sistemas SCADA, muito provavelmente contendo informações relevantes, deve ser tratada para descobrir relações e padrões e assim ajudar na compreensão de muitos aspectos operacionais importantes e avaliar o desempenho dos sistemas elétricos de potência. O processo de Descoberta de Conhecimento em Banco de Dados (Knowledge Discovery in Database - KDD) é o processo de identificar, em grandes conjuntos de dados, padrões que sejam válidos, novos, úteis e compreensíveis, para melhorar o entendimento de um problema ou um procedimento de tomada de decisão. A Mineração de Dados (ou Data Mining) é o passo dentro do KDD que permite extrair informações úteis em grandes bases de dados. Neste cenário, o presente trabalho se propõe a realizar experimentos de mineração de dados nos dados gerados por sistemas SCADA em UHE, a fim de produzir informações relevantes para auxiliar no planejamento, operação, manutenção e segurança das hidrelétricas e na implantação da cultura da mineração de dados aplicada a estas usinas.
Resumo:
As técnicas utilizadas para avaliação da segurança estática em sistemas elétricos de potência dependem da execução de grande número de casos de fluxo de carga para diversas topologias e condições operacionais do sistema. Em ambientes de operação de tempo real, esta prática é de difícil realização, principalmente em sistemas de grande porte onde a execução de todos os casos de fluxo de carga que são necessários, exige elevado tempo e esforço computacional mesmo para os recursos atuais disponíveis. Técnicas de mineração de dados como árvore de decisão estão sendo utilizadas nos últimos anos e tem alcançado bons resultados nas aplicações de avaliação da segurança estática e dinâmica de sistemas elétricos de potência. Este trabalho apresenta uma metodologia para avaliação da segurança estática em tempo real de sistemas elétricos de potência utilizando árvore de decisão, onde a partir de simulações off-line de fluxo de carga, executadas via software Anarede (CEPEL), foi gerada uma extensa base de dados rotulada relacionada ao estado do sistema, para diversas condições operacionais. Esta base de dados foi utilizada para indução das árvores de decisão, fornecendo um modelo de predição rápida e precisa que classifica o estado do sistema (seguro ou inseguro) para aplicação em tempo real. Esta metodologia reduz o uso de computadores no ambiente on-line, uma vez que o processamento das árvores de decisão exigem apenas a verificação de algumas instruções lógicas do tipo if-then, de um número reduzido de testes numéricos nos nós binários para definição do valor do atributo que satisfaz as regras, pois estes testes são realizados em quantidade igual ao número de níveis hierárquicos da árvore de decisão, o que normalmente é reduzido. Com este processamento computacional simples, a tarefa de avaliação da segurança estática poderá ser executada em uma fração do tempo necessário para a realização pelos métodos tradicionais mais rápidos. Para validação da metodologia, foi realizado um estudo de caso baseado em um sistema elétrico real, onde para cada contingência classificada como inseguro, uma ação de controle corretivo é executada, a partir da informação da árvore de decisão sobre o atributo crítico que mais afeta a segurança. Os resultados mostraram ser a metodologia uma importante ferramenta para avaliação da segurança estática em tempo real para uso em um centro de operação do sistema.
Resumo:
Pós-graduação em Ciência da Computação - IBILCE
Resumo:
With the increase of stakeholders and consequently increase of amount of nancial transaction the study of news investment strategies in the stock market with data mining techniques has been the target of important researches. It allows that great historical data base to be processed and analysed looking for pattern that can be used to take a decision in investments. With the idea of getting pro t more than the real indexs' gain, we propose a strategy method of transactions using rules built by algorithm classi cation. For that, diary historical data of Ibovespa index and Petrobras stocks are organized and processed to nding the most important attribute that act decisively when taking a investment decision.To test the accuracy of proposed rules, a non real portfolio management is created, showing the decisions' performance over the real index and stocks' performance. Following the proposed rules, the results show that the strategy of investment give me back a high return that Stock market's return. The exclusive characteristics of algorithms maximize the gain inside the analysed time allowing to determine the techniques' return and the number of the days necessary to double the initial investment. The best classi er applied on the time series and its use on the propose investments strategy will demand 104 days to double the initial capital
Resumo:
Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
Presentations sponsored by the Patent and Trademark Depository Library Association (PTDLA) at the American Library Association Annual Conference, New Orleans, June 25, 2006 Speaker #1: Nan Myers Associate Professor; Government Documents, Patents and Trademarks Librarian Wichita State University, Wichita, KS Title: Intellectual Property Roundup: Copyright, Trademarks, Trade Secrets, and Patents Abstract: This presentation provides a capsule overview of the distinctive coverage of the four types of intellectual property – What they are, why they are important, how to get them, what they cost, how long they last. Emphasis will be on what questions patrons ask most, along with the answers! Includes coverage of the mission of Patent & Trademark Depository Libraries (PTDLs) and other sources of business information outside of libraries, such as Small Business Development Centers. Speaker #2: Jan Comfort Government Information Reference Librarian Clemson University, Clemson, SC Title: Patents as a Source of Competitive Intelligence Information Abstract: Large corporations often have R&D departments, or large numbers of staff whose jobs are to monitor the activities of their competitors. This presentation will review strategies that small business owners can employ to do their own competitive intelligence analysis. The focus will be on features of the patent database that is available free of charge on the USPTO website, as well as commercial databases available at many public and academic libraries across the country. Speaker #3: Virginia Baldwin Professor; Engineering Librarian University of Nebraska-Lincoln, Lincoln, NE Title: Mining Online Patent Data for Business Information Abstract: The United States Patent and Trademark Office (USPTO) website and websites of international databases contains information about granted patents and patent applications and the technologies they represent. Statistical information about patents, their technologies, geographical information, and patenting entities are compiled and available as reports on the USPTO website. Other valuable information from these websites can be obtained using data mining techniques. This presentation will provide the keys to opening these resources and obtaining valuable data. Speaker #4: Donna Hopkins Engineering Librarian Renssalaer Polytechnic Institute, Troy, NY Title: Searching the USPTO Trademark Database for Wordmarks and Logos Abstract: This presentation provides an overview of wordmark searching in www.uspto.gov, followed by a review of the techniques of searching for non-word US trademarks using codes from the Design Search Code Manual. These codes are used in an electronic search, either on the uspto website or on CASSIS DVDs. The search is sometimes supplemented by consulting the Official Gazette. A specific example of using a section of the codes for searching is included. Similar searches on the Madrid Express database of WIPO, using the Vienna Classification, will also be briefly described.
Resumo:
Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)
Resumo:
The Dipteran a native Brazilian insect that has become a valuable model system for developmental biology research because it provides an interesting opportunity to study a different type of insect oogenesis. Sequences from a cDNA library that was constructed with poly A + RNA from the ovaries of larvae at different ages were analyzed. Molecular characterization confirmed interesting findings, such as the presence of . The gene encodes a conserved RNA-binding protein that is required during early development for the maintenance and division of the primordial germ cells of Diptera. plays an important role in specifying the posterior regions of insect embryos and is important for abdomen formation. In the present work, we showed the spatial and temporal expression profiles of this important gene, which is involved in oogenesis and early development. Data mining techniques were used to obtain the complete sequence of . Bioinformatic tools were used to determine the following: (1) the secondary structure of the 3'-untranslated region of the mRNA, (2) the encoded protein of the isolated gene, (3) the conserved zinc-finger domains of the Nanos protein, and (4) phylogenetic analyses. Furthermore, RNA in situ hybridization and immunolocalization were used to determine mRNA and protein expression in the tissues that were studied and to define as a germ cell molecular marker.