788 resultados para data mining applications


Relevância:

80.00% 80.00%

Publicador:

Resumo:

The domain of Knowledge Discovery (KD) and Data Mining (DM) is of growing importance in a time where more and more data is produced and knowledge is one of the most precious assets. Having explored both the existing underlying theory, the results of the ongoing research in academia and the industry practices in the domain of KD and DM, we have found that this is a domain that still lacks some systematization. We also found that this systematization exists to a greater degree in the Software Engineering and Requirements Engineering domains, probably due to being more mature areas. We believe that it is possible to improve and facilitate the participation of enterprise stakeholders in the requirements engineering for KD projects by systematizing requirements engineering process for such projects. This will, in turn, result in more projects that end successfully, that is, with satisfied stakeholders, including in terms of time and budget constraints. With this in mind and based on all information found in the state-of-the art, we propose SysPRE - Systematized Process for Requirements Engineering in KD projects. We begin by proposing an encompassing generic description of the KD process, where the main focus is on the Requirements Engineering activities. This description is then used as a base for the application of the Design and Engineering Methodology for Organizations (DEMO) so that we can specify a formal ontology for this process. The resulting SysPRE ontology can serve as a base that can be used not only to make enterprises become aware of their own KD process and requirements engineering process in the KD projects, but also to improve such processes in reality, namely in terms of success rate.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Layer mortality due to heat stress is an important economic loss for the producer. The aim of this study was to determine the mortality pattern of layers reared in the region of Bastos, SP, Brazil, according to external environment and bird age. Data mining technique were used based on monthly mortality records of hens in production, 135 poultry houses, from January 2004 to August 2008. The external environment was characterized according maximum and minimum temperatures, obtained monthly at the meteorological station CATI in the city of Tupa, SP, Brazil. Mortality was classified as normal (<= 1.2%) or high (> 1.2%), considering the mortality limits mentioned in literature. Data mining technique produced a decision tree with nine levels and 23 leaves, with 62.6% of overall accuracy. The hit rate for the High class was 64.1% and 59.9% for Normal class. The decision tree allowed finding a pattern in the mortality data, generating a model for estimating mortality based on the thermal environment and bird age.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Foliar diagnosis is a method for assessing the nutritional status of agricultural crops, which helps in the understanding of soil fertility and rationalized application of fertilizers taking into account economic and environmental criteria. The study aimed to use the landrelief as criteria to assist in interpreting the spatial variability of nutrient content of the citrus leaf. The leaves were collected at regular intervals of 50 m, totaling 332 sampling points. Data were analyzed by descriptive statistics, geostatistics and induction of decision tree. With the aid of digital elevation model (MDE) and the profile planaltimetric, the area was divided into three different landrelief and sub-strands. The highest values for nutrients from the leaves of citrus were observed at the top (concave area) segments on a half-slope and lower slope. The nutrients from the citrus leaves showed high values of correlation (above 0.5) with the altitude of the study area. The technique of geostatistics and the induction of decision tree show that the relief is the variable with the greatest potential to interpret the maps of spatial variability of nutrients from the citrus leaves.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The relevance of rising healthcare costs is a main topic in complementary health companies in Brazil. In 2011, these expenses consumed more than 80% of the monthly health insurance in Brazil. Considering the administrative costs, it is observed that the companies operating in this market work, on average, at the threshold between profit and loss. This paper presents results after an investigation of the welfare costs of a health plan company in Brazil. It was based on the KDD process and explorative Data Mining. A diversity of results is presented, such as data summarization, providing compact descriptions of the data, revealing common features and intrinsic observations. Among the key findings was observed that a small portion of the population is responsible for the most demanding of resources devoted to health care

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Currently, one of the biggest challenges for the field of data mining is to perform cluster analysis on complex data. Several techniques have been proposed but, in general, they can only achieve good results within specific areas providing no consensus of what would be the best way to group this kind of data. In general, these techniques fail due to non-realistic assumptions about the true probability distribution of the data. Based on this, this thesis proposes a new measure based on Cross Information Potential that uses representative points of the dataset and statistics extracted directly from data to measure the interaction between groups. The proposed approach allows us to use all advantages of this information-theoretic descriptor and solves the limitations imposed on it by its own nature. From this, two cost functions and three algorithms have been proposed to perform cluster analysis. As the use of Information Theory captures the relationship between different patterns, regardless of assumptions about the nature of this relationship, the proposed approach was able to achieve a better performance than the main algorithms in literature. These results apply to the context of synthetic data designed to test the algorithms in specific situations and to real data extracted from problems of different fields

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The monitoring of patients performed in hospitals is usually done either in a manual or semiautomated way, where the members of the healthcare team must constantly visit the patients to ascertain the health condition in which they are. The adoption of this procedure, however, compromises the quality of the monitoring conducted since the shortage of physical and human resources in hospitals tends to overwhelm members of the healthcare team, preventing them from moving to patients with adequate frequency. Given this, many existing works in the literature specify alternatives aimed at improving this monitoring through the use of wireless networks. In these works, the network is only intended for data traffic generated by medical sensors and there is no possibility of it being allocated for the transmission of data from applications present in existing user stations in the hospital. However, in the case of hospital automation environments, this aspect is a negative point, considering that the data generated in such applications can be directly related to the patient monitoring conducted. Thus, this thesis defines Wi-Bio as a communication protocol aimed at the establishment of IEEE 802.11 networks for patient monitoring, capable of enabling the harmonious coexistence among the traffic generated by medical sensors and user stations. The formal specification and verification of Wi-Bio were made through the design and analysis of Petri net models. Its validation was performed through simulations with the Network Simulator 2 (NS2) tool. The simulations of NS2 were designed to portray a real patient monitoring environment corresponding to a floor of the nursing wards sector of the University Hospital Onofre Lopes (HUOL), located at Natal, Rio Grande do Norte. Moreover, in order to verify the feasibility of Wi-Bio in terms of wireless networks standards prevailing in the market, the testing scenario was also simulated under a perspective in which the network elements used the HCCA access mechanism described in the IEEE 802.11e amendment. The results confirmed the validity of the designed Petri nets and showed that Wi-Bio, in addition to presenting a superior performance compared to HCCA on most items analyzed, was also able to promote efficient integration between the data generated by medical sensors and user applications on the same wireless network

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The opening of the Brazilian market of electricity and competitiveness between companies in the energy sector make the search for useful information and tools that will assist in decision making activities, increase by the concessionaires. An important source of knowledge for these utilities is the time series of energy demand. The identification of behavior patterns and description of events become important for the planning execution, seeking improvements in service quality and financial benefits. This dissertation presents a methodology based on mining and representation tools of time series, in order to extract knowledge that relate series of electricity demand in various substations connected of a electric utility. The method exploits the relationship of duration, coincidence and partial order of events in multi-dimensionals time series. To represent the knowledge is used the language proposed by Mörchen (2005) called Time Series Knowledge Representation (TSKR). We conducted a case study using time series of energy demand of 8 substations interconnected by a ring system, which feeds the metropolitan area of Goiânia-GO, provided by CELG (Companhia Energética de Goiás), responsible for the service of power distribution in the state of Goiás (Brazil). Using the proposed methodology were extracted three levels of knowledge that describe the behavior of the system studied, representing clearly the system dynamics, becoming a tool to assist planning activities

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Self-organizing maps (SOM) are artificial neural networks widely used in the data mining field, mainly because they constitute a dimensionality reduction technique given the fixed grid of neurons associated with the network. In order to properly the partition and visualize the SOM network, the various methods available in the literature must be applied in a post-processing stage, that consists of inferring, through its neurons, relevant characteristics of the data set. In general, such processing applied to the network neurons, instead of the entire database, reduces the computational costs due to vector quantization. This work proposes a post-processing of the SOM neurons in the input and output spaces, combining visualization techniques with algorithms based on gravitational forces and the search for the shortest path with the greatest reward. Such methods take into account the connection strength between neighbouring neurons and characteristics of pattern density and distances among neurons, both associated with the position that the neurons occupy in the data space after training the network. Thus, the goal consists of defining more clearly the arrangement of the clusters present in the data. Experiments were carried out so as to evaluate the proposed methods using various artificially generated data sets, as well as real world data sets. The results obtained were compared with those from a number of well-known methods existent in the literature

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Conselho Nacional de Desenvolvimento Científico e Tecnológico (CNPq)

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Clustering data is a very important task in data mining, image processing and pattern recognition problems. One of the most popular clustering algorithms is the Fuzzy C-Means (FCM). This thesis proposes to implement a new way of calculating the cluster centers in the procedure of FCM algorithm which are called ckMeans, and in some variants of FCM, in particular, here we apply it for those variants that use other distances. The goal of this change is to reduce the number of iterations and processing time of these algorithms without affecting the quality of the partition, or even to improve the number of correct classifications in some cases. Also, we developed an algorithm based on ckMeans to manipulate interval data considering interval membership degrees. This algorithm allows the representation of data without converting interval data into punctual ones, as it happens to other extensions of FCM that deal with interval data. In order to validate the proposed methodologies it was made a comparison between a clustering for ckMeans, K-Means and FCM algorithms (since the algorithm proposed in this paper to calculate the centers is similar to the K-Means) considering three different distances. We used several known databases. In this case, the results of Interval ckMeans were compared with the results of other clustering algorithms when applied to an interval database with minimum and maximum temperature of the month for a given year, referring to 37 cities distributed across continents

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The increase of applications complexity has demanded hardware even more flexible and able to achieve higher performance. Traditional hardware solutions have not been successful in providing these applications constraints. General purpose processors have inherent flexibility, since they perform several tasks, however, they can not reach high performance when compared to application-specific devices. Moreover, since application-specific devices perform only few tasks, they achieve high performance, although they have less flexibility. Reconfigurable architectures emerged as an alternative to traditional approaches and have become an area of rising interest over the last decades. The purpose of this new paradigm is to modify the device s behavior according to the application. Thus, it is possible to balance flexibility and performance and also to attend the applications constraints. This work presents the design and implementation of a coarse grained hybrid reconfigurable architecture to stream-based applications. The architecture, named RoSA, consists of a reconfigurable logic attached to a processor. Its goal is to exploit the instruction level parallelism from intensive data-flow applications to accelerate the application s execution on the reconfigurable logic. The instruction level parallelism extraction is done at compile time, thus, this work also presents an optimization phase to the RoSA architecture to be included in the GCC compiler. To design the architecture, this work also presents a methodology based on hardware reuse of datapaths, named RoSE. RoSE aims to visualize the reconfigurable units through reusability levels, which provides area saving and datapath simplification. The architecture presented was implemented in hardware description language (VHDL). It was validated through simulations and prototyping. To characterize performance analysis some benchmarks were used and they demonstrated a speedup of 11x on the execution of some applications

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The increasing complexity of integrated circuits has boosted the development of communications architectures like Networks-on-Chip (NoCs), as an architecture; alternative for interconnection of Systems-on-Chip (SoC). Networks-on-Chip complain for component reuse, parallelism and scalability, enhancing reusability in projects of dedicated applications. In the literature, lots of proposals have been made, suggesting different configurations for networks-on-chip architectures. Among all networks-on-chip considered, the architecture of IPNoSys is a non conventional one, since it allows the execution of operations, while the communication process is performed. This study aims to evaluate the execution of data-flow based applications on IPNoSys, focusing on their adaptation against the design constraints. Data-flow based applications are characterized by the flowing of continuous stream of data, on which operations are executed. We expect that these type of applications can be improved when running on IPNoSys, because they have a programming model similar to the execution model of this network. By observing the behavior of these applications when running on IPNoSys, were performed changes in the execution model of the network IPNoSys, allowing the implementation of an instruction level parallelism. For these purposes, analysis of the implementations of dataflow applications were performed and compared

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Data clustering is applied to various fields such as data mining, image processing and pattern recognition technique. Clustering algorithms splits a data set into clusters such that elements within the same cluster have a high degree of similarity, while elements belonging to different clusters have a high degree of dissimilarity. The Fuzzy C-Means Algorithm (FCM) is a fuzzy clustering algorithm most used and discussed in the literature. The performance of the FCM is strongly affected by the selection of the initial centers of the clusters. Therefore, the choice of a good set of initial cluster centers is very important for the performance of the algorithm. However, in FCM, the choice of initial centers is made randomly, making it difficult to find a good set. This paper proposes three new methods to obtain initial cluster centers, deterministically, the FCM algorithm, and can also be used in variants of the FCM. In this work these initialization methods were applied in variant ckMeans.With the proposed methods, we intend to obtain a set of initial centers which are close to the real cluster centers. With these new approaches startup if you want to reduce the number of iterations to converge these algorithms and processing time without affecting the quality of the cluster or even improve the quality in some cases. Accordingly, cluster validation indices were used to measure the quality of the clusters obtained by the modified FCM and ckMeans algorithms with the proposed initialization methods when applied to various data sets