14 resultados para educational data mining
em Scielo Saúde Pública - SP
Resumo:
A gestão do conhecimento abrange toda a forma de gerar, armazenar, distribuir e utilizar o conhecimento, tornando necessária a utilização de tecnologias de informação para facilitar esse processo, devido ao grande aumento no volume de dados. A descoberta de conhecimento em banco de dados é uma metodologia que tenta solucionar esse problema e o data mining é uma técnica que faz parte dessa metodologia. Este artigo desenvolve, aplica e analisa uma ferramenta de data mining, para extrair conhecimento referente à produção científica das pessoas envolvidas com a pesquisa na Universidade Federal de Lavras. A metodologia utilizada envolveu a pesquisa bibliográfica, a pesquisa documental e o método do estudo de caso. As limitações encontradas na análise dos resultados indicam que ainda é preciso padronizar o modo do preenchimento dos currículos Lattes para refinar as análises e, com isso, estabelecer indicadores. A contribuição foi gerar um banco de dados estruturado, que faz parte de um processo maior de desenvolvimento de indicadores de ciência e tecnologia, para auxiliar na elaboração de novas políticas de gestão científica e tecnológica e aperfeiçoamento do sistema de ensino superior brasileiro.
Resumo:
Human T-cell lymphotropic virus type 1 (HTLV-1) is mainly associated with two diseases: tropical spastic paraparesis/HTLV-1-associated myelopathy (TSP/HAM) and adult T-cell leukaemia/lymphoma. This retrovirus infects five-10 million individuals throughout the world. Previously, we developed a database that annotates sequence data from GenBank and the present study aimed to describe the clinical, molecular and epidemiological scenarios of HTLV-1 infection through the stored sequences in this database. A total of 2,545 registered complete and partial sequences of HTLV-1 were collected and 1,967 (77.3%) of those sequences represented unique isolates. Among these isolates, 93% contained geographic origin information and only 39% were related to any clinical status. A total of 1,091 sequences contained information about the geographic origin and viral subtype and 93% of these sequences were identified as subtype “a”. Ethnicity data are very scarce. Regarding clinical status data, 29% of the sequences were generated from TSP/HAM and 67.8% from healthy carrier individuals. Although the data mining enabled some inferences about specific aspects of HTLV-1 infection to be made, due to the relative scarcity of data of available sequences, it was not possible to delineate a global scenario of HTLV-1 infection.
Resumo:
This paper presents a process of mining research & development abstract databases to profile current status and to project potential developments for target technologies, The process is called "technology opportunities analysis." This article steps through the process using a sample data set of abstracts from the INSPEC database on the topic o "knowledge discovery and data mining." The paper offers a set of specific indicators suitable for mining such databases to understand innovation prospects. In illustrating the uses of such indicators, it offers some insights into the status of knowledge discovery research*.
Resumo:
O assunto Brasil foi analisado na base de teses francesas DocThèses, compreendendo os anos de 1969 a 1999. Utilizou-se a técnica de Data Mining como ferramenta para obter inteligência e conhecimento. O software utilizado para a limpeza da base DocThèses foi o Infotrans, e, para a preparação dos dados, empregou-se o Dataview. Os resultados da análise foram ilustrados com a aplicação dos pressupostos da Lei de Zipf, classificando-se as informações em trivial, interessante e ruído, conforme a distribuição de freqüência. Conclui-se que a técnica do Data Mining associada a softwares especialistas é uma poderosa aliada no emprego de inteligência no processo decisório em todos os níveis, inclusive o nível macro, pois oferece subsídios para a consolidação, investimento e desenvolvimento de ações e políticas.
Resumo:
This study aimed at identifying different conditions of coffee plants after harvesting period, using data mining and spectral behavior profiles from Hyperion/EO1 sensor. The Hyperion image, with spatial resolution of 30 m, was acquired in August 28th, 2008, at the end of the coffee harvest season in the studied area. For pre-processing imaging, atmospheric and signal/noise effect corrections were carried out using Flaash and MNF (Minimum Noise Fraction Transform) algorithms, respectively. Spectral behavior profiles (38) of different coffee varieties were generated from 150 Hyperion bands. The spectral behavior profiles were analyzed by Expectation-Maximization (EM) algorithm considering 2; 3; 4 and 5 clusters. T-test with 5% of significance was used to verify the similarity among the wavelength cluster means. The results demonstrated that it is possible to separate five different clusters, which were comprised by different coffee crop conditions making possible to improve future intervention actions.
Resumo:
Among the challenges of pig farming in today's competitive market, there is factor of the product traceability that ensures, among many points, animal welfare. Vocalization is a valuable tool to identify situations of stress in pigs, and it can be used in welfare records for traceability. The objective of this work was to identify stress in piglets using vocalization, calling this stress on three levels: no stress, moderate stress, and acute stress. An experiment was conducted on a commercial farm in the municipality of Holambra, São Paulo State , where vocalizations of twenty piglets were recorded during the castration procedure, and separated into two groups: without anesthesia and local anesthesia with lidocaine base. For the recording of acoustic signals, a unidirectional microphone was connected to a digital recorder, in which signals were digitized at a frequency of 44,100 Hz. For evaluation of sound signals, Praat® software was used, and different data mining algorithms were applied using Weka® software. The selection of attributes improved model accuracy, and the best attribute selection was used by applying Wrapper method, while the best classification algorithms were the k-NN and Naive Bayes. According to the results, it was possible to classify the level of stress in pigs through their vocalization.
Resumo:
Locomotor problems prevent the bird to move freely, jeopardizing the welfare and productivity, besides generating injuries on the legs of chickens. The objective of this study was to evaluate the influence of age, use of vitamin D, the asymmetry of limbs and gait score, the degree of leg injuries in broilers, using data mining. The analysis was performed on a data set obtained from a field experiment in which it was used two groups of birds with 30 birds each, a control group and one treated with vitamin D. It was evaluated the gait score, the asymmetry between the right and left toes, and the degree of leg injuries. The Weka ® software was used in data mining. In particular, C4.5 algorithm (also known as J48 in Weka environment) was used for the generation of a decision tree. The results showed that age is the factor that most influences the degree of leg injuries and that the data from assessments of gait score were not reliable to estimate leg weakness in broilers.
Resumo:
The aim of this study was to group temporal profiles of 10-day composites NDVI product by similarity, which was obtained by the SPOT Vegetation sensor, for municipalities with high soybean production in the state of Paraná, Brazil, in the 2005/2006 cropping season. Data mining is a valuable tool that allows extracting knowledge from a database, identifying valid, new, potentially useful and understandable patterns. Therefore, it was used the methods for clusters generation by means of the algorithms K-Means, MAXVER and DBSCAN, implemented in the WEKA software package. Clusters were created based on the average temporal profiles of NDVI of the 277 municipalities with high soybean production in the state and the best results were found with the K-Means algorithm, grouping the municipalities into six clusters, considering the period from the beginning of October until the end of March, which is equivalent to the crop vegetative cycle. Half of the generated clusters presented spectro-temporal pattern, a characteristic of soybeans and were mostly under the soybean belt in the state of Paraná, which shows good results that were obtained with the proposed methodology as for identification of homogeneous areas. These results will be useful for the creation of regional soybean "masks" to estimate the planted area for this crop.
Resumo:
This study aimed to identify differences in swine vocalization pattern according to animal gender and different stress conditions. A total of 150 barrow males and 150 females (Dalland® genetic strain), aged 100 days, were used in the experiment. Pigs were exposed to different stressful situations: thirst (no access to water), hunger (no access to food), and thermal stress (THI exceeding 74). For the control treatment, animals were kept under a comfort situation (animals with full access to food and water, with environmental THI lower than 70). Acoustic signals were recorded every 30 minutes, totaling six samples for each stress situation. Afterwards, the audios were analyzed by Praat® 5.1.19 software, generating a sound spectrum. For determination of stress conditions, data were processed by WEKA® 3.5 software, using the decision tree algorithm C4.5, known as J48 in the software environment, considering cross-validation with samples of 10% (10-fold cross-validation). According to the Decision Tree, the acoustic most important attribute for the classification of stress conditions was sound Intensity (root node). It was not possible to identify, using the tested attributes, the animal gender by vocal register. A decision tree was generated for recognition of situations of swine hunger, thirst, and heat stress from records of sound intensity, Pitch frequency, and Formant 1.
Resumo:
Digital information generates the possibility of a high degree of redundancy in the data available for fitting predictive models used for Digital Soil Mapping (DSM). Among these models, the Decision Tree (DT) technique has been increasingly applied due to its capacity of dealing with large datasets. The purpose of this study was to evaluate the impact of the data volume used to generate the DT models on the quality of soil maps. An area of 889.33 km² was chosen in the Northern region of the State of Rio Grande do Sul. The soil-landscape relationship was obtained from reambulation of the studied area and the alignment of the units in the 1:50,000 scale topographic mapping. Six predictive covariates linked to the factors soil formation, relief and organisms, together with data sets of 1, 3, 5, 10, 15, 20 and 25 % of the total data volume, were used to generate the predictive DT models in the data mining program Waikato Environment for Knowledge Analysis (WEKA). In this study, sample densities below 5 % resulted in models with lower power of capturing the complexity of the spatial distribution of the soil in the study area. The relation between the data volume to be handled and the predictive capacity of the models was best for samples between 5 and 15 %. For the models based on these sample densities, the collected field data indicated an accuracy of predictive mapping close to 70 %.
Resumo:
ABSTRACT This study aimed to describe the digital disease detection and participatory surveillance in different countries. The systems or platforms consolidated in the scientific field were analyzed by describing the strategy, type of data source, main objectives, and manner of interaction with users. Eleven systems or platforms, developed from 1996 to 2016, were analyzed. There was a higher frequency of data mining on the web and active crowdsourcing as well as a trend in the use of mobile applications. It is important to provoke debate in the academia and health services for the evolution of methods and insights into participatory surveillance in the digital age.
Resumo:
Given the limitations of different types of remote sensing images, automated land-cover classifications of the Amazon várzea may yield poor accuracy indexes. One way to improve accuracy is through the combination of images from different sensors, by either image fusion or multi-sensor classifications. Therefore, the objective of this study was to determine which classification method is more efficient in improving land cover classification accuracies for the Amazon várzea and similar wetland environments - (a) synthetically fused optical and SAR images or (b) multi-sensor classification of paired SAR and optical images. Land cover classifications based on images from a single sensor (Landsat TM or Radarsat-2) are compared with multi-sensor and image fusion classifications. Object-based image analyses (OBIA) and the J.48 data-mining algorithm were used for automated classification, and classification accuracies were assessed using the kappa index of agreement and the recently proposed allocation and quantity disagreement measures. Overall, optical-based classifications had better accuracy than SAR-based classifications. Once both datasets were combined using the multi-sensor approach, there was a 2% decrease in allocation disagreement, as the method was able to overcome part of the limitations present in both images. Accuracy decreased when image fusion methods were used, however. We therefore concluded that the multi-sensor classification method is more appropriate for classifying land cover in the Amazon várzea.
Resumo:
Over the past three decades, pedotransfer functions (PTFs) have been widely used by soil scientists to estimate soils properties in temperate regions in response to the lack of soil data for these regions. Several authors indicated that little effort has been dedicated to the prediction of soil properties in the humid tropics, where the need for soil property information is of even greater priority. The aim of this paper is to provide an up-to-date repository of past and recently published articles as well as papers from proceedings of events dealing with water-retention PTFs for soils of the humid tropics. Of the 35 publications found in the literature on PTFs for prediction of water retention of soils of the humid tropics, 91 % of the PTFs are based on an empirical approach, and only 9 % are based on a semi-physical approach. Of the empirical PTFs, 97 % are continuous, and 3 % (one) is a class PTF; of the empirical PTFs, 97 % are based on multiple linear and polynomial regression of n th order techniques, and 3 % (one) is based on the k-Nearest Neighbor approach; 84 % of the continuous PTFs are point-based, and 16 % are parameter-based; 97 % of the continuous PTFs are equation-based PTFs, and 3 % (one) is based on pattern recognition. Additionally, it was found that 26 % of the tropical water-retention PTFs were developed for soils in Brazil, 26 % for soils in India, 11 % for soils in other countries in America, and 11 % for soils in other countries in Africa.
Resumo:
O objetivo deste trabalho foi analisar o comportamento espaçotemporal da precipitação pluvial no Estado do Rio Grande do Sul, entre os decênios de 1987-1996 e 1997-2006, por meio de técnicas de mineração de dados. As séries históricas foram adquiridas no sistema de informações hidrológicas Hidroweb. A metodologia utilizada teve como base o modelo CRISP-DM (Cross Industry Standard Process for Data Mining). Foram definidas áreas pluviometricamente homogêneas para os decênios de 1987-1996 e 1997-2006. Em seguida, pela sobreposição dos agrupamentos obtidos para os dois períodos, encontraram-se seis zonas comuns aos dois decênios (A a F). As alterações ocorridas foram avaliadas nas seguintes escalas temporais: anual, sazonal e mensalmente. Os resultados indicaram incrementos significativos (20 a 240 mm) na precipitação anual em todas as zonas, exceto na zona A. Na análise sazonal, as variações foram aleatórias, sendo que, na primavera, todas as zonas apresentaram incremento significativo (44 a 142 mm). Na análise mensal, destaca-se a redução ocorrida no mês de janeiro em todas as zonas, exceto na E. Nos demais meses, as variações foram aleatórias. Os resultados mostram que, entre os decênios, houve uma alteração no volume da precipitação pluvial em todas as escalas temporais analisadas.