791 resultados para Web Data Mining


Relevância:

80.00% 80.00%

Publicador:

Resumo:

Oxidative stress generating active oxygen species has been proved to be one of the underlying agents causing tissue injury after the exposure of Eucalyptus (Eucalyptus spp.) plants to a wide variety of stress conditions. The objective of this study was to perform data mining to identify favorable genes and alleles associated with the enzyme systems superoxide dismutase, catalase, peroxidases, and glutathione S-transferase that are related to tolerance for environmental stresses and damage caused by pests, diseases, herbicides, and by weeds themselves. This was undertaken by using the eucalyptus expressed-sequence database (https//forests.esalq.usp.br). The alignment results between amino acid and nucleotide sequences indicated that the studied enzymes were adequately represented in the ESTs database of the FORESTs project.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Concept drift is a problem of increasing importance in machine learning and data mining. Data sets under analysis are no longer only static databases, but also data streams in which concepts and data distributions may not be stable over time. However, most learning algorithms produced so far are based on the assumption that data comes from a fixed distribution, so they are not suitable to handle concept drifts. Moreover, some concept drifts applications requires fast response, which means an algorithm must always be (re) trained with the latest available data. But the process of labeling data is usually expensive and/or time consuming when compared to unlabeled data acquisition, thus only a small fraction of the incoming data may be effectively labeled. Semi-supervised learning methods may help in this scenario, as they use both labeled and unlabeled data in the training process. However, most of them are also based on the assumption that the data is static. Therefore, semi-supervised learning with concept drifts is still an open challenge in machine learning. Recently, a particle competition and cooperation approach was used to realize graph-based semi-supervised learning from static data. In this paper, we extend that approach to handle data streams and concept drift. The result is a passive algorithm using a single classifier, which naturally adapts to concept changes, without any explicit drift detection mechanism. Its built-in mechanisms provide a natural way of learning from new data, gradually forgetting older knowledge as older labeled data items became less influent on the classification of newer data items. Some computer simulation are presented, showing the effectiveness of the proposed method.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

As a new modeling method, support vector regression (SVR) has been regarded as the state-of-the-art technique for regression and approximation. In this study, the SVR models had been introduced and developed to predict body and carcass-related characteristics of 2 strains of broiler chicken. To evaluate the prediction ability of SVR models, we compared their performance with that of neural network (NN) models. Evaluation of the prediction accuracy of models was based on the R-2, MS error, and bias. The variables of interest as model output were BW, empty BW, carcass, breast, drumstick, thigh, and wing weight in 2 strains of Ross and Cobb chickens based on intake dietary nutrients, including ME (kcal/bird per week), CP, TSAA, and Lys, all as grams per bird per week. A data set composed of 64 measurements taken from each strain were used for this analysis, where 44 data lines were used for model training, whereas the remaining 20 lines were used to test the created models. The results of this study revealed that it is possible to satisfactorily estimate the BW and carcass parts of the broiler chickens via their dietary nutrient intake. Through statistical criteria used to evaluate the performance of the SVR and NN models, the overall results demonstrate that the discussed models can be effective for accurate prediction of the body and carcass-related characteristics investigated here. However, the SVR method achieved better accuracy and generalization than the NN method. This indicates that the new data mining technique (SVR model) can be used as an alternative modeling tool for NN models. However, further reevaluation of this algorithm in the future is suggested.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper describes a data mining environment for knowledge discovery in bioinformatics applications. The system has a generic kernel that implements the mining functions to be applied to input primary databases, with a warehouse architecture, of biomedical information. Both supervised and unsupervised classification can be implemented within the kernel and applied to data extracted from the primary database, with the results being suitably stored in a complex object database for knowledge discovery. The kernel also includes a specific high-performance library that allows designing and applying the mining functions in parallel machines. The experimental results obtained by the application of the kernel functions are reported. © 2003 Elsevier Ltd. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper presents the analysis that have been carried out in the alarm system of the DCRanger EMS. The intention of this study is to present the problem of alarm processing in electric energy control centers, its various aspects and operational difficulties due to operator needs. Some tests are produced in order to identify the desirable features an alarm system should possess in order to be of effective help in the operative duty. © 2006 IEEE.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Hemoglobinopathies were included in the Brazilian Neonatal Screening Program on June 6, 2001. Automated high-performance liquid chromatography (HPLC) was indicated as one of the diagnostic methods. The amount of information generated by these systems is immense, and the behavior of groups cannot always be observed in individual analyses. Three-dimensional (3-D) visualization techniques can be applied to extract this information, for extracting patterns, trends or relations from the results stored in databases. We applied the 3-D visualization tool to analyze patterns in the results of hemoglobinopathy based on neonatal diagnosis by HPLC. The laboratory results of 2520 newborn analyses carried out in 2001 and 2002 were used. The Fast, F1, F and A peaks, which were detected by the analytical system, were chosen as attributes for mapping. To establish a behavior pattern, the results were classified into groups according to hemoglobin phenotype: normal (N = 2169), variant (N = 73) and thalassemia (N = 279). 3-D visualization was made with the FastMap DB tool; there were two distribution patterns in the normal group, due to variation in the amplitude of the values obtained by HPLC for the F1 window. It allowed separation of the samples with normal Hb from those with alpha thalassemia, based on a significant difference (P > 0.05) between the mean values of the Fast and A peaks, demonstrating the need for better evaluation of chromatograms; this method could be used to help diagnose alpha thalassemia in newborns.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

This paper describes an investigation of the hybrid PSO/ACO algorithm to classify automatically the well drilling operation stages. The method feasibility is demonstrated by its application to real mud-logging dataset. The results are compared with bio-inspired methods, and rule induction and decision tree algorithms for data mining. © 2009 Springer Berlin Heidelberg.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Includes bibliography

Relevância:

80.00% 80.00%

Publicador:

Resumo:

In this work, a new approach for supervised pattern recognition is presented which improves the learning algorithm of the Optimum-Path Forest classifier (OPF), centered on detection and elimination of outliers in the training set. Identification of outliers is based on a penalty computed for each sample in the training set from the corresponding number of imputable false positive and false negative classification of samples. This approach enhances the accuracy of OPF while still gaining in classification time, at the expense of a slight increase in training time. © 2010 Springer-Verlag.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Aiming to ensure greater reliability and consistency of data stored in the database, the data cleaning stage is set early in the process of Knowledge Discovery in Databases (KDD) and is responsible for eliminating problems and adjust the data for the later stages, especially for the stage of data mining. Such problems occur in the instance level and schema, namely, missing values, null values, duplicate tuples, values outside the domain, among others. Several algorithms were developed to perform the cleaning step in databases, some of them were developed specifically to work with the phonetics of words, since a word can be written in different ways. Within this perspective, this work presents as original contribution an optimization of algorithm for the detection of duplicate tuples in databases through phonetic based on multithreading without the need for trained data, as well as an independent environment of language to be supported for this. © 2011 IEEE.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

The significant volume of work accidents in the cities causes an expressive loss to society. The development of Spatial Data Mining technologies presents a new perspective for the extraction of knowledge from the correlation between conventional and spatial attributes. One of the most important techniques of the Spatial Data Mining is the Spatial Clustering, which clusters similar spatial objects to find a distribution of patterns, taking into account the geographical position of the objects. Applying this technique to the health area, will provide information that can contribute towards the planning of more adequate strategies for the prevention of work accidents. The original contribution of this work is to present an application of tools developed for Spatial Clustering which supply a set of graphic resources that have helped to discover knowledge and support for management in the work accidents area. © 2011 IEEE.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Malicious programs (malware) can cause severe damage on computer systems and data. The mechanism that the human immune system uses to detect and protect from organisms that threaten the human body is efficient and can be adapted to detect malware attacks. In this paper we propose a system to perform malware distributed collection, analysis and detection, this last inspired by the human immune system. After collecting malware samples from Internet, they are dynamically analyzed so as to provide execution traces at the operating system level and network flows that are used to create a behavioral model and to generate a detection signature. Those signatures serve as input to a malware detector, acting as the antibodies in the antigen detection process. This allows us to understand the malware attack and aids in the infection removal procedures. © 2012 Springer-Verlag.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

One way to organize knowledge and make its search and retrieval easier is to create a structural representation divided by hierarchically related topics. Once this structure is built, it is necessary to find labels for each of the obtained clusters. In many cases the labels must be built using all the terms in the documents of the collection. This paper presents the SeCLAR method, which explores the use of association rules in the selection of good candidates for labels of hierarchical document clusters. The purpose of this method is to select a subset of terms by exploring the relationship among the terms of each document. Thus, these candidates can be processed by a classical method to generate the labels. An experimental study demonstrates the potential of the proposed approach to improve the precision and recall of labels obtained by classical methods only considering the terms which are potentially more discriminative. © 2012 - IOS Press and the authors. All rights reserved.

Relevância:

80.00% 80.00%

Publicador:

Resumo:

Identification and classification of overlapping nodes in networks are important topics in data mining. In this paper, a network-based (graph-based) semi-supervised learning method is proposed. It is based on competition and cooperation among walking particles in a network to uncover overlapping nodes by generating continuous-valued outputs (soft labels), corresponding to the levels of membership from the nodes to each of the communities. Moreover, the proposed method can be applied to detect overlapping data items in a data set of general form, such as a vector-based data set, once it is transformed to a network. Usually, label propagation involves risks of error amplification. In order to avoid this problem, the proposed method offers a mechanism to identify outliers among the labeled data items, and consequently prevents error propagation from such outliers. Computer simulations carried out for synthetic and real-world data sets provide a numeric quantification of the performance of the method. © 2012 Springer-Verlag.