60 resultados para outlier detection, data mining, gpgpu, gpu computing, supercomputing


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Concept drift is a problem of increasing importance in machine learning and data mining. Data sets under analysis are no longer only static databases, but also data streams in which concepts and data distributions may not be stable over time. However, most learning algorithms produced so far are based on the assumption that data comes from a fixed distribution, so they are not suitable to handle concept drifts. Moreover, some concept drifts applications requires fast response, which means an algorithm must always be (re) trained with the latest available data. But the process of labeling data is usually expensive and/or time consuming when compared to unlabeled data acquisition, thus only a small fraction of the incoming data may be effectively labeled. Semi-supervised learning methods may help in this scenario, as they use both labeled and unlabeled data in the training process. However, most of them are also based on the assumption that the data is static. Therefore, semi-supervised learning with concept drifts is still an open challenge in machine learning. Recently, a particle competition and cooperation approach was used to realize graph-based semi-supervised learning from static data. In this paper, we extend that approach to handle data streams and concept drift. The result is a passive algorithm using a single classifier, which naturally adapts to concept changes, without any explicit drift detection mechanism. Its built-in mechanisms provide a natural way of learning from new data, gradually forgetting older knowledge as older labeled data items became less influent on the classification of newer data items. Some computer simulation are presented, showing the effectiveness of the proposed method.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Linear mixed effects models have been widely used in analysis of data where responses are clustered around some random effects, so it is not reasonable to assume independence between observations in the same cluster. In most biological applications, it is assumed that the distributions of the random effects and of the residuals are Gaussian. This makes inferences vulnerable to the presence of outliers. Here, linear mixed effects models with normal/independent residual distributions for robust inferences are described. Specific distributions examined include univariate and multivariate versions of the Student-t, the slash and the contaminated normal. A Bayesian framework is adopted and Markov chain Monte Carlo is used to carry out the posterior analysis. The procedures are illustrated using birth weight data on rats in a texicological experiment. Results from the Gaussian and robust models are contrasted, and it is shown how the implementation can be used for outlier detection. The thick-tailed distributions provide an appealing robust alternative to the Gaussian process in linear mixed models, and they are easily implemented using data augmentation and MCMC techniques.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

This paper describes a data mining environment for knowledge discovery in bioinformatics applications. The system has a generic kernel that implements the mining functions to be applied to input primary databases, with a warehouse architecture, of biomedical information. Both supervised and unsupervised classification can be implemented within the kernel and applied to data extracted from the primary database, with the results being suitably stored in a complex object database for knowledge discovery. The kernel also includes a specific high-performance library that allows designing and applying the mining functions in parallel machines. The experimental results obtained by the application of the kernel functions are reported. © 2003 Elsevier Ltd. All rights reserved.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The significant volume of work accidents in the cities causes an expressive loss to society. The development of Spatial Data Mining technologies presents a new perspective for the extraction of knowledge from the correlation between conventional and spatial attributes. One of the most important techniques of the Spatial Data Mining is the Spatial Clustering, which clusters similar spatial objects to find a distribution of patterns, taking into account the geographical position of the objects. Applying this technique to the health area, will provide information that can contribute towards the planning of more adequate strategies for the prevention of work accidents. The original contribution of this work is to present an application of tools developed for Spatial Clustering which supply a set of graphic resources that have helped to discover knowledge and support for management in the work accidents area. © 2011 IEEE.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Pós-graduação em Medicina Veterinária - FMVZ

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Pós-graduação em Ciência da Computação - IBILCE

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Pós-graduação em Ciência da Computação - IBILCE

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Pós-graduação em Agronomia (Energia na Agricultura) - FCA

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Coordenação de Aperfeiçoamento de Pessoal de Nível Superior (CAPES)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Layer mortality due to heat stress is an important economic loss for the producer. The aim of this study was to determine the mortality pattern of layers reared in the region of Bastos, SP, Brazil, according to external environment and bird age. Data mining technique were used based on monthly mortality records of hens in production, 135 poultry houses, from January 2004 to August 2008. The external environment was characterized according maximum and minimum temperatures, obtained monthly at the meteorological station CATI in the city of Tupa, SP, Brazil. Mortality was classified as normal (<= 1.2%) or high (> 1.2%), considering the mortality limits mentioned in literature. Data mining technique produced a decision tree with nine levels and 23 leaves, with 62.6% of overall accuracy. The hit rate for the High class was 64.1% and 59.9% for Normal class. The decision tree allowed finding a pattern in the mortality data, generating a model for estimating mortality based on the thermal environment and bird age.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Fundação de Amparo à Pesquisa do Estado de São Paulo (FAPESP)

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Foliar diagnosis is a method for assessing the nutritional status of agricultural crops, which helps in the understanding of soil fertility and rationalized application of fertilizers taking into account economic and environmental criteria. The study aimed to use the landrelief as criteria to assist in interpreting the spatial variability of nutrient content of the citrus leaf. The leaves were collected at regular intervals of 50 m, totaling 332 sampling points. Data were analyzed by descriptive statistics, geostatistics and induction of decision tree. With the aid of digital elevation model (MDE) and the profile planaltimetric, the area was divided into three different landrelief and sub-strands. The highest values for nutrients from the leaves of citrus were observed at the top (concave area) segments on a half-slope and lower slope. The nutrients from the citrus leaves showed high values of correlation (above 0.5) with the altitude of the study area. The technique of geostatistics and the induction of decision tree show that the relief is the variable with the greatest potential to interpret the maps of spatial variability of nutrients from the citrus leaves.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Background: Leptospirosis is an important zoonotic disease associated with poor areas of urban settings of developing countries and early diagnosis and prompt treatment may prevent disease. Although rodents are reportedly considered the main reservoirs of leptospirosis, dogs may develop the disease, may become asymptomatic carriers and may be used as sentinels for disease epidemiology. The use of Geographical Information Systems (GIS) combined with spatial analysis techniques allows the mapping of the disease and the identification and assessment of health risk factors. Besides the use of GIS and spatial analysis, the technique of data mining, decision tree, can provide a great potential to find a pattern in the behavior of the variables that determine the occurrence of leptospirosis. The objective of the present study was to apply Geographical Information Systems and data prospection (decision tree) to evaluate the risk factors for canine leptospirosis in an area of Curitiba, PR.Materials, Methods & Results: The present study was performed on the Vila Pantanal, a urban poor community in the city of Curitiba. A total of 287 dog blood samples were randomly obtained house-by-house in a two-day sampling on January 2010. In addition, a questionnaire was applied to owners at the time of sampling. Geographical coordinates related to each household of tested dog were obtained using a Global Positioning System (GPS) for mapping the spatial distribution of reagent and non-reagent dogs to leptospirosis. For the decision tree, risk factors included results of microagglutination test (MAT) from the serum of dogs, previous disease on the household, contact with rats or other dogs, dog breed, outdoors access, feeding, trash around house or backyard, open sewer proximity and flooding. A total of 189 samples (about 2/3 of overall samples) were randomly selected for the training file and consequent decision rules. The remained 98 samples were used for the testing file. The seroprevalence showed a pattern of spatial distribution that involved all the Pantanal area, without agglomeration of reagent animals. In relation to data mining, from 189 samples used in decision tree, a total of 165 (87.3%) animal samples were correctly classified, generating a Kappa index of 0.413. A total of 154 out of 159 (96.8%) samples were considered non-reagent and were correctly classified and only 5/159 (3.2%) were wrongly identified. on the other hand, only 11 (36.7%) reagent samples were correctly classified, with 19 (63.3%) samples failing diagnosis.Discussion: The spatial distribution that involved all the Pantanal area showed that all the animals in the area are at risk of contamination by Leptospira spp. Although most samples had been classified correctly by the decision tree, a degree of difficulty of separability related to seropositive animals was observed, with only 36.7% of the samples classified correctly. This can occur due to the fact of seronegative animals number is superior to the number of seropositive ones, taking the differences in the pattern of variable behavior. The data mining helped to evaluate the most important risk factors for leptospirosis in an urban poor community of Curitiba. The variables selected by decision tree reflected the important factors about the existence of the disease (default of sewer, presence of rats and rubbish and dogs with free access to street). The analyses showed the multifactorial character of the epidemiology of canine leptospirosis.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Cultivated peanut (Arachis hypogaea) is an important crop, widely grown in tropical and subtropical regions of the world. It is highly susceptible to several biotic and abiotic stresses to which wild species are resistant. As a first step towards the introgression of these resistance genes into cultivated peanut, a linkage map based on microsatellite markers was constructed, using an F-2 population obtained from a cross between two diploid wild species with AA genome (A. duranensis and A. stenosperma). A total of 271 new microsatellite markers were developed in the present study from SSR-enriched genomic libraries, expressed sequence tags (ESTs), and by data-mining sequences available in GenBank. of these, 66 were polymorphic for cultivated peanut. The 271 new markers plus another 162 published for peanut were screened against both progenitors and 204 of these (47.1%) were polymorphic, with 170 codominant and 34 dominant markers. The 80 codominant markers segregating 1:2:1 (P < 0.05) were initially used to establish the linkage groups. Distorted and dominant markers were subsequently included in the map. The resulting linkage map consists of 11 linkage groups covering 1,230.89 cM of total map distance, with an average distance of 7.24 cM between markers. This is the first microsatellite-based map published for Arachis, and the first map based on sequences that are all currently publicly available. Because most markers used were derived from ESTs and genomic libraries made using methylation-sensitive restriction enzymes, about one-third of the mapped markers are genic. Linkage group ordering is being validated in other mapping populations, with the aim of constructing a transferable reference map for Arachis.