16 resultados para Data Mining, Big Data, Consumi energetici, Weka Data Cleaning
em Scielo Saúde Pública - SP
Resumo:
Among the challenges of pig farming in today's competitive market, there is factor of the product traceability that ensures, among many points, animal welfare. Vocalization is a valuable tool to identify situations of stress in pigs, and it can be used in welfare records for traceability. The objective of this work was to identify stress in piglets using vocalization, calling this stress on three levels: no stress, moderate stress, and acute stress. An experiment was conducted on a commercial farm in the municipality of Holambra, São Paulo State , where vocalizations of twenty piglets were recorded during the castration procedure, and separated into two groups: without anesthesia and local anesthesia with lidocaine base. For the recording of acoustic signals, a unidirectional microphone was connected to a digital recorder, in which signals were digitized at a frequency of 44,100 Hz. For evaluation of sound signals, Praat® software was used, and different data mining algorithms were applied using Weka® software. The selection of attributes improved model accuracy, and the best attribute selection was used by applying Wrapper method, while the best classification algorithms were the k-NN and Naive Bayes. According to the results, it was possible to classify the level of stress in pigs through their vocalization.
Resumo:
Locomotor problems prevent the bird to move freely, jeopardizing the welfare and productivity, besides generating injuries on the legs of chickens. The objective of this study was to evaluate the influence of age, use of vitamin D, the asymmetry of limbs and gait score, the degree of leg injuries in broilers, using data mining. The analysis was performed on a data set obtained from a field experiment in which it was used two groups of birds with 30 birds each, a control group and one treated with vitamin D. It was evaluated the gait score, the asymmetry between the right and left toes, and the degree of leg injuries. The Weka ® software was used in data mining. In particular, C4.5 algorithm (also known as J48 in Weka environment) was used for the generation of a decision tree. The results showed that age is the factor that most influences the degree of leg injuries and that the data from assessments of gait score were not reliable to estimate leg weakness in broilers.
Resumo:
The aim of this study was to group temporal profiles of 10-day composites NDVI product by similarity, which was obtained by the SPOT Vegetation sensor, for municipalities with high soybean production in the state of Paraná, Brazil, in the 2005/2006 cropping season. Data mining is a valuable tool that allows extracting knowledge from a database, identifying valid, new, potentially useful and understandable patterns. Therefore, it was used the methods for clusters generation by means of the algorithms K-Means, MAXVER and DBSCAN, implemented in the WEKA software package. Clusters were created based on the average temporal profiles of NDVI of the 277 municipalities with high soybean production in the state and the best results were found with the K-Means algorithm, grouping the municipalities into six clusters, considering the period from the beginning of October until the end of March, which is equivalent to the crop vegetative cycle. Half of the generated clusters presented spectro-temporal pattern, a characteristic of soybeans and were mostly under the soybean belt in the state of Paraná, which shows good results that were obtained with the proposed methodology as for identification of homogeneous areas. These results will be useful for the creation of regional soybean "masks" to estimate the planted area for this crop.
Resumo:
This study aimed to identify differences in swine vocalization pattern according to animal gender and different stress conditions. A total of 150 barrow males and 150 females (Dalland® genetic strain), aged 100 days, were used in the experiment. Pigs were exposed to different stressful situations: thirst (no access to water), hunger (no access to food), and thermal stress (THI exceeding 74). For the control treatment, animals were kept under a comfort situation (animals with full access to food and water, with environmental THI lower than 70). Acoustic signals were recorded every 30 minutes, totaling six samples for each stress situation. Afterwards, the audios were analyzed by Praat® 5.1.19 software, generating a sound spectrum. For determination of stress conditions, data were processed by WEKA® 3.5 software, using the decision tree algorithm C4.5, known as J48 in the software environment, considering cross-validation with samples of 10% (10-fold cross-validation). According to the Decision Tree, the acoustic most important attribute for the classification of stress conditions was sound Intensity (root node). It was not possible to identify, using the tested attributes, the animal gender by vocal register. A decision tree was generated for recognition of situations of swine hunger, thirst, and heat stress from records of sound intensity, Pitch frequency, and Formant 1.
Resumo:
A gestão do conhecimento abrange toda a forma de gerar, armazenar, distribuir e utilizar o conhecimento, tornando necessária a utilização de tecnologias de informação para facilitar esse processo, devido ao grande aumento no volume de dados. A descoberta de conhecimento em banco de dados é uma metodologia que tenta solucionar esse problema e o data mining é uma técnica que faz parte dessa metodologia. Este artigo desenvolve, aplica e analisa uma ferramenta de data mining, para extrair conhecimento referente à produção científica das pessoas envolvidas com a pesquisa na Universidade Federal de Lavras. A metodologia utilizada envolveu a pesquisa bibliográfica, a pesquisa documental e o método do estudo de caso. As limitações encontradas na análise dos resultados indicam que ainda é preciso padronizar o modo do preenchimento dos currículos Lattes para refinar as análises e, com isso, estabelecer indicadores. A contribuição foi gerar um banco de dados estruturado, que faz parte de um processo maior de desenvolvimento de indicadores de ciência e tecnologia, para auxiliar na elaboração de novas políticas de gestão científica e tecnológica e aperfeiçoamento do sistema de ensino superior brasileiro.
Resumo:
In this study, we concentrate on modelling gross primary productivity using two simple approaches to simulate canopy photosynthesis: "big leaf" and "sun/shade" models. Two approaches for calibration are used: scaling up of canopy photosynthetic parameters from the leaf to the canopy level and fitting canopy biochemistry to eddy covariance fluxes. Validation of the models is achieved by using eddy covariance data from the LBA site C14. Comparing the performance of both models we conclude that numerically (in terms of goodness of fit) and qualitatively, (in terms of residual response to different environmental variables) sun/shade does a better job. Compared to the sun/shade model, the big leaf model shows a lower goodness of fit and fails to respond to variations in the diffuse fraction, also having skewed responses to temperature and VPD. The separate treatment of sun and shade leaves in combination with the separation of the incoming light into direct beam and diffuse make sun/shade a strong modelling tool that catches more of the observed variability in canopy fluxes as measured by eddy covariance. In conclusion, the sun/shade approach is a relatively simple and effective tool for modelling photosynthetic carbon uptake that could be easily included in many terrestrial carbon models.
Resumo:
Human T-cell lymphotropic virus type 1 (HTLV-1) is mainly associated with two diseases: tropical spastic paraparesis/HTLV-1-associated myelopathy (TSP/HAM) and adult T-cell leukaemia/lymphoma. This retrovirus infects five-10 million individuals throughout the world. Previously, we developed a database that annotates sequence data from GenBank and the present study aimed to describe the clinical, molecular and epidemiological scenarios of HTLV-1 infection through the stored sequences in this database. A total of 2,545 registered complete and partial sequences of HTLV-1 were collected and 1,967 (77.3%) of those sequences represented unique isolates. Among these isolates, 93% contained geographic origin information and only 39% were related to any clinical status. A total of 1,091 sequences contained information about the geographic origin and viral subtype and 93% of these sequences were identified as subtype “a”. Ethnicity data are very scarce. Regarding clinical status data, 29% of the sequences were generated from TSP/HAM and 67.8% from healthy carrier individuals. Although the data mining enabled some inferences about specific aspects of HTLV-1 infection to be made, due to the relative scarcity of data of available sequences, it was not possible to delineate a global scenario of HTLV-1 infection.
Resumo:
Digital information generates the possibility of a high degree of redundancy in the data available for fitting predictive models used for Digital Soil Mapping (DSM). Among these models, the Decision Tree (DT) technique has been increasingly applied due to its capacity of dealing with large datasets. The purpose of this study was to evaluate the impact of the data volume used to generate the DT models on the quality of soil maps. An area of 889.33 km² was chosen in the Northern region of the State of Rio Grande do Sul. The soil-landscape relationship was obtained from reambulation of the studied area and the alignment of the units in the 1:50,000 scale topographic mapping. Six predictive covariates linked to the factors soil formation, relief and organisms, together with data sets of 1, 3, 5, 10, 15, 20 and 25 % of the total data volume, were used to generate the predictive DT models in the data mining program Waikato Environment for Knowledge Analysis (WEKA). In this study, sample densities below 5 % resulted in models with lower power of capturing the complexity of the spatial distribution of the soil in the study area. The relation between the data volume to be handled and the predictive capacity of the models was best for samples between 5 and 15 %. For the models based on these sample densities, the collected field data indicated an accuracy of predictive mapping close to 70 %.
Resumo:
This paper presents a process of mining research & development abstract databases to profile current status and to project potential developments for target technologies, The process is called "technology opportunities analysis." This article steps through the process using a sample data set of abstracts from the INSPEC database on the topic o "knowledge discovery and data mining." The paper offers a set of specific indicators suitable for mining such databases to understand innovation prospects. In illustrating the uses of such indicators, it offers some insights into the status of knowledge discovery research*.
Resumo:
O assunto Brasil foi analisado na base de teses francesas DocThèses, compreendendo os anos de 1969 a 1999. Utilizou-se a técnica de Data Mining como ferramenta para obter inteligência e conhecimento. O software utilizado para a limpeza da base DocThèses foi o Infotrans, e, para a preparação dos dados, empregou-se o Dataview. Os resultados da análise foram ilustrados com a aplicação dos pressupostos da Lei de Zipf, classificando-se as informações em trivial, interessante e ruído, conforme a distribuição de freqüência. Conclui-se que a técnica do Data Mining associada a softwares especialistas é uma poderosa aliada no emprego de inteligência no processo decisório em todos os níveis, inclusive o nível macro, pois oferece subsídios para a consolidação, investimento e desenvolvimento de ações e políticas.
Resumo:
This study aimed at identifying different conditions of coffee plants after harvesting period, using data mining and spectral behavior profiles from Hyperion/EO1 sensor. The Hyperion image, with spatial resolution of 30 m, was acquired in August 28th, 2008, at the end of the coffee harvest season in the studied area. For pre-processing imaging, atmospheric and signal/noise effect corrections were carried out using Flaash and MNF (Minimum Noise Fraction Transform) algorithms, respectively. Spectral behavior profiles (38) of different coffee varieties were generated from 150 Hyperion bands. The spectral behavior profiles were analyzed by Expectation-Maximization (EM) algorithm considering 2; 3; 4 and 5 clusters. T-test with 5% of significance was used to verify the similarity among the wavelength cluster means. The results demonstrated that it is possible to separate five different clusters, which were comprised by different coffee crop conditions making possible to improve future intervention actions.
Resumo:
The constant scientific production in the universities and in the research centers makes these organizations produce and acquire a great amount of data in a short period of time. Due to the big quantity of data, the research organizations become potentially vulnerable to the impacts on information booms that may cause a chaos as far as information management is concerned. In this context, the development of data catalogues comes up as one possible solution to the problems such as (I) the organization and (II) the data management. In the scientific scope, the data catalogues are implemented with the standard for digital and geospatial metadata and are broadly utilized in the process of producing a catalogue of scientific information. The aim of this work is to present the characteristics of access and storage of metadata in databank systems in order to improve the description and dissemination of scientific data. Relevant aspects will be considered and they should be analyzed during the stage of planning, once they can determine the success of implementation. The use of data catalogues by research organizations may be a way to promote and facilitate the dissemination of scientific data, avoid the repetition of efforts while being executed, as well as incentivate the use of collected, processed an also stored.
Resumo:
The nutritional status according to anthropometric data was assessed in 756 schoolchildren from 5 low-income state schools and in one private school in the same part of Rio de Janeiro, Brazil. The prevalence of stunting and wasting (cut-off point: <90% ht/age and <80% wt/ht) ranged in the public schools from 6.2 to 15.2% and 3.3 to 24.0%, respectively, whereas the figures for the private school were 2.3 and 3.5%, respectively. Much more obesity was found in the private school (18.0%) than in the state schools (0.8 - 6.2%). Nutritional problems seem to develop more severely in accordance with the increasing age of the children. Therefore it appears advisable to assess schoolchildren within the context of a nutritional surveillance system.
Resumo:
The objective of the study was to compare information collected through face-to-face interviews at first time and six years later in a city of Southeastern Brazil. In 1998, 32 mothers (N=32) of children aged 20 to 30 months answered a face-to-face interview with structured questions regarding their children's brushing habits. Six years later this same interview was repeated with the same mothers. Both interviews were compared for overall agreement, kappa and weighted kappa. Overall agreement between both interviews varied from 41 to 96%. Kappa values ranged from 0.00 to 0.65 (very bad to good) without any significant differences. The results showed lack of agreement when the same interview is conducted six years later, showing that the recall bias can be a methodological problem of interviews.