797 resultados para Educational data mining


Relevância:

80.00% 80.00%

Publicador:

Resumo:

O objetivo deste trabalho é testar a aplicação de um modelo gráfico probabilístico, denominado genericamente de Redes Bayesianas, para desenvolver modelos computacionais que possam ser utilizados para auxiliar a compreensão de problemas e/ou na previsão de variáveis de natureza econômica. Com este propósito, escolheu-se um problema amplamente abordado na literatura e comparou-se os resultados teóricos e experimentais já consolidados com os obtidos utilizando a técnica proposta. Para tanto,foi construído um modelo para a classificação da tendência do "risco país" para o Brasil a partir de uma base de dados composta por variáveis macroeconômicas e financeiras. Como medida do risco adotou-se o EMBI+ (Emerging Markets Bond Index Plus), por ser um indicador amplamente utilizado pelo mercado.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The domain of Knowledge Discovery (KD) and Data Mining (DM) is of growing importance in a time where more and more data is produced and knowledge is one of the most precious assets. Having explored both the existing underlying theory, the results of the ongoing research in academia and the industry practices in the domain of KD and DM, we have found that this is a domain that still lacks some systematization. We also found that this systematization exists to a greater degree in the Software Engineering and Requirements Engineering domains, probably due to being more mature areas. We believe that it is possible to improve and facilitate the participation of enterprise stakeholders in the requirements engineering for KD projects by systematizing requirements engineering process for such projects. This will, in turn, result in more projects that end successfully, that is, with satisfied stakeholders, including in terms of time and budget constraints. With this in mind and based on all information found in the state-of-the art, we propose SysPRE - Systematized Process for Requirements Engineering in KD projects. We begin by proposing an encompassing generic description of the KD process, where the main focus is on the Requirements Engineering activities. This description is then used as a base for the application of the Design and Engineering Methodology for Organizations (DEMO) so that we can specify a formal ontology for this process. The resulting SysPRE ontology can serve as a base that can be used not only to make enterprises become aware of their own KD process and requirements engineering process in the KD projects, but also to improve such processes in reality, namely in terms of success rate.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Layer mortality due to heat stress is an important economic loss for the producer. The aim of this study was to determine the mortality pattern of layers reared in the region of Bastos, SP, Brazil, according to external environment and bird age. Data mining technique were used based on monthly mortality records of hens in production, 135 poultry houses, from January 2004 to August 2008. The external environment was characterized according maximum and minimum temperatures, obtained monthly at the meteorological station CATI in the city of Tupa, SP, Brazil. Mortality was classified as normal (<= 1.2%) or high (> 1.2%), considering the mortality limits mentioned in literature. Data mining technique produced a decision tree with nine levels and 23 leaves, with 62.6% of overall accuracy. The hit rate for the High class was 64.1% and 59.9% for Normal class. The decision tree allowed finding a pattern in the mortality data, generating a model for estimating mortality based on the thermal environment and bird age.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Foliar diagnosis is a method for assessing the nutritional status of agricultural crops, which helps in the understanding of soil fertility and rationalized application of fertilizers taking into account economic and environmental criteria. The study aimed to use the landrelief as criteria to assist in interpreting the spatial variability of nutrient content of the citrus leaf. The leaves were collected at regular intervals of 50 m, totaling 332 sampling points. Data were analyzed by descriptive statistics, geostatistics and induction of decision tree. With the aid of digital elevation model (MDE) and the profile planaltimetric, the area was divided into three different landrelief and sub-strands. The highest values for nutrients from the leaves of citrus were observed at the top (concave area) segments on a half-slope and lower slope. The nutrients from the citrus leaves showed high values of correlation (above 0.5) with the altitude of the study area. The technique of geostatistics and the induction of decision tree show that the relief is the variable with the greatest potential to interpret the maps of spatial variability of nutrients from the citrus leaves.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The relevance of rising healthcare costs is a main topic in complementary health companies in Brazil. In 2011, these expenses consumed more than 80% of the monthly health insurance in Brazil. Considering the administrative costs, it is observed that the companies operating in this market work, on average, at the threshold between profit and loss. This paper presents results after an investigation of the welfare costs of a health plan company in Brazil. It was based on the KDD process and explorative Data Mining. A diversity of results is presented, such as data summarization, providing compact descriptions of the data, revealing common features and intrinsic observations. Among the key findings was observed that a small portion of the population is responsible for the most demanding of resources devoted to health care

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Currently, one of the biggest challenges for the field of data mining is to perform cluster analysis on complex data. Several techniques have been proposed but, in general, they can only achieve good results within specific areas providing no consensus of what would be the best way to group this kind of data. In general, these techniques fail due to non-realistic assumptions about the true probability distribution of the data. Based on this, this thesis proposes a new measure based on Cross Information Potential that uses representative points of the dataset and statistics extracted directly from data to measure the interaction between groups. The proposed approach allows us to use all advantages of this information-theoretic descriptor and solves the limitations imposed on it by its own nature. From this, two cost functions and three algorithms have been proposed to perform cluster analysis. As the use of Information Theory captures the relationship between different patterns, regardless of assumptions about the nature of this relationship, the proposed approach was able to achieve a better performance than the main algorithms in literature. These results apply to the context of synthetic data designed to test the algorithms in specific situations and to real data extracted from problems of different fields

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The opening of the Brazilian market of electricity and competitiveness between companies in the energy sector make the search for useful information and tools that will assist in decision making activities, increase by the concessionaires. An important source of knowledge for these utilities is the time series of energy demand. The identification of behavior patterns and description of events become important for the planning execution, seeking improvements in service quality and financial benefits. This dissertation presents a methodology based on mining and representation tools of time series, in order to extract knowledge that relate series of electricity demand in various substations connected of a electric utility. The method exploits the relationship of duration, coincidence and partial order of events in multi-dimensionals time series. To represent the knowledge is used the language proposed by Mörchen (2005) called Time Series Knowledge Representation (TSKR). We conducted a case study using time series of energy demand of 8 substations interconnected by a ring system, which feeds the metropolitan area of Goiânia-GO, provided by CELG (Companhia Energética de Goiás), responsible for the service of power distribution in the state of Goiás (Brazil). Using the proposed methodology were extracted three levels of knowledge that describe the behavior of the system studied, representing clearly the system dynamics, becoming a tool to assist planning activities

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Self-organizing maps (SOM) are artificial neural networks widely used in the data mining field, mainly because they constitute a dimensionality reduction technique given the fixed grid of neurons associated with the network. In order to properly the partition and visualize the SOM network, the various methods available in the literature must be applied in a post-processing stage, that consists of inferring, through its neurons, relevant characteristics of the data set. In general, such processing applied to the network neurons, instead of the entire database, reduces the computational costs due to vector quantization. This work proposes a post-processing of the SOM neurons in the input and output spaces, combining visualization techniques with algorithms based on gravitational forces and the search for the shortest path with the greatest reward. Such methods take into account the connection strength between neighbouring neurons and characteristics of pattern density and distances among neurons, both associated with the position that the neurons occupy in the data space after training the network. Thus, the goal consists of defining more clearly the arrangement of the clusters present in the data. Experiments were carried out so as to evaluate the proposed methods using various artificially generated data sets, as well as real world data sets. The results obtained were compared with those from a number of well-known methods existent in the literature

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This exploratory study aims to present some readings as Doyé (2003), Carrasco Perea (2003), Melo Araújo e Sá (2004), Chavagne (2009) and Alas-Martins (2010; 2011) which helped to confirm some ways for showing that a plurilingual environment can enable a trend in improving the understanding of written texts in the mother tongue, and can collaborate on a better perception of the world around a person with all their different nuances. The study describes the methodology and some results of our doctoral research that resulted in the insertion of the experimental discipline called Intercomprehension of Romanic Languages (ILR) in the curriculum in the city of Natal / RN / Brazil, and it was justified because of high functional illiteracy degree among young people up to 15 years old according to the educational data from IBGE research on 2010. The results were verified through an experimental action-research which was characterized by Lewin (1946); Nunan (1992); Thiollent (1994) and Trip (2005) in two schools: Professoara Terezinha Paulino de Lima (municipal school) and Professora Ana Julia de Carvalho Mousinho (State of Rio Grande do Norte), with 95 students from the final years of primary education. The corpus of this research was subjected to a series of condensed techniques like the nonparametric test from Kruskal and Wallis (1952) and the parametric test ANOVA as an effort to provide statistical significance to the analysis of the results indicated in the book of ILR activities. The research presented some skill views about reading comprehension of written texts according to perspective of Ringbow (1987), Giacobbe (1990), Alarcão (1991; 2009a and 2009b), Corder (1992), Castellotti (2001) and Degache (2003), and the possibilities of transfer these skills for learning Portuguese as pointed out by Meissner, Klein and Stegmann (2004); it indicates a positive trend towards the understanding of LM according to analyzing the scores of written tests and texts by participants in solving tasks