853 resultados para Data stream mining


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Data mining, frequent pattern mining, database mining, mining algorithms in SQL

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Visual data mining, multi-dimensional scaling, POLARMAP, Sammon's mapping, clustering, outlier detection

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Magdeburg, Univ., Fak. für Informatik, Diss., 2013

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper analyses the relationship among mesohabitat and aquatic oligochaete species in the Galharada Stream (Campos do Jordão State Park, state of São Paulo, Brazil). Between August 2005 and May 2006 a total of 192 samples were obtained in areas of four different mesohabitats: riffle leaf litter (RL), pool leaf litter (PL), pool sediment (PS) and interstitial sediment from rocky beds in riffle areas (IS). In the mesohabitats sampled, 2007 specimens were identified, belonging to two families (Naididae and Enchytraeidae). Among the oligochaetes identified Naididae was represented by six genera (Allonais, Chaetogaster, Nais, Pristina, Aulodrilus and Limnodrilus). Principal components analysis (PCA) revealed the first two axes explained 85.1% of the total variance of the data. Limnodrilus hoffmeisteri Claparede, 1862 and Aulodrilus limnobius Bretscher, 1899 were associated with the pool areas (PL and PS). Most species of genera Pristina and Nais demonstrated apparent affinity with the riffle mesohabitats. The Indicator Species Analysis (IndVal) revealed that Nais communis Piguet, 1906, Pristina leidyi Smith, 1896 and Pristina (Pristinella) jenkinae (Stephenson, 1931) are indicative of RL mesohabitat, while family Enchytraeidae was considered indicative of PL mesohabitat.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This paper uses data on the world's copper mining industry to measure the impact on efficiency of the adoption of the ISO 14001 environmental standard. Anecdotal and case study literature suggests that firms are motivated to adopt this standard so as to achieve greater efficiency through changes in operating procedures and processes. Using plant level panel data from 1992-2007 on most of the world's industrial copper mines, the study uses stochastic frontier methods to investigate the effects of ISO adoption. The variety of models used in this study find that adoption either tends to improve efficiency or has no impact on efficiency, but no evidence is found that ISO adoption decreases efficiency.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND Spain shows the highest bladder cancer incidence rates in men among European countries. The most important risk factors are tobacco smoking and occupational exposure to a range of different chemical substances, such as aromatic amines. METHODS This paper describes the municipal distribution of bladder cancer mortality and attempts to "adjust" this spatial pattern for the prevalence of smokers, using the autoregressive spatial model proposed by Besag, York and Molliè, with relative risk of lung cancer mortality as a surrogate. RESULTS It has been possible to compile and ascertain the posterior distribution of relative risk for bladder cancer adjusted for lung cancer mortality, on the basis of a single Bayesian spatial model covering all of Spain's 8077 towns. Maps were plotted depicting smoothed relative risk (RR) estimates, and the distribution of the posterior probability of RR>1 by sex. Towns that registered the highest relative risks for both sexes were mostly located in the Provinces of Cadiz, Seville, Huelva, Barcelona and Almería. The highest-risk area in Barcelona Province corresponded to very specific municipal areas in the Bages district, e.g., Suría, Sallent, Balsareny, Manresa and Cardona. CONCLUSION Mining/industrial pollution and the risk entailed in certain occupational exposures could in part be dictating the pattern of municipal bladder cancer mortality in Spain. Population exposure to arsenic is a matter that calls for attention. It would be of great interest if the relationship between the chemical quality of drinking water and the frequency of bladder cancer could be studied.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Gold-mining may play an important role in the maintenance of malaria worldwide. Gold-mining, mostly illegal, has significantly expanded in Colombia during the last decade in areas with limited health care and disease prevention. We report a descriptive study that was carried out to determine the malaria prevalence in gold-mining areas of Colombia, using data from the public health surveillance system (National Health Institute) during the period 2010-2013. Gold-mining was more prevalent in the departments of Antioquia, Córdoba, Bolívar, Chocó, Nariño, Cauca, and Valle, which contributed 89.3% (270,753 cases) of the national malaria incidence from 2010-2013 and 31.6% of malaria cases were from mining areas. Mining regions, such as El Bagre, Zaragoza, and Segovia, in Antioquia, Puerto Libertador and Montelíbano, in Córdoba, and Buenaventura, in Valle del Cauca, were the most endemic areas. The annual parasite index (API) correlated with gold production (R2 0.82, p < 0.0001); for every 100 kg of gold produced, the API increased by 0.54 cases per 1,000 inhabitants. Lack of malaria control activities, together with high migration and proliferation of mosquito breeding sites, contribute to malaria in gold-mining regions. Specific control activities must be introduced to control this significant source of malaria in Colombia.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The objective of the PANACEA ICT-2007.2.2 EU project is to build a platform that automates the stages involved in the acquisition,production, updating and maintenance of the large language resources required by, among others, MT systems. The development of a Corpus Acquisition Component (CAC) for extracting monolingual and bilingual data from the web is one of the most innovative building blocks of PANACEA. The CAC, which is the first stage in the PANACEA pipeline for building Language Resources, adopts an efficient and distributed methodology to crawl for web documents with rich textual content in specific languages and predefined domains. The CAC includes modules that can acquire parallel data from sites with in-domain content available in more than one language. In order to extrinsically evaluate the CAC methodology, we have conducted several experiments that used crawled parallel corpora for the identification and extraction of parallel sentences using sentence alignment. The corpora were then successfully used for domain adaptation of Machine Translation Systems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recently, kernel-based Machine Learning methods have gained great popularity in many data analysis and data mining fields: pattern recognition, biocomputing, speech and vision, engineering, remote sensing etc. The paper describes the use of kernel methods to approach the processing of large datasets from environmental monitoring networks. Several typical problems of the environmental sciences and their solutions provided by kernel-based methods are considered: classification of categorical data (soil type classification), mapping of environmental and pollution continuous information (pollution of soil by radionuclides), mapping with auxiliary information (climatic data from Aral Sea region). The promising developments, such as automatic emergency hot spot detection and monitoring network optimization are discussed as well.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A extracção clandestina de areia, nas faixas costeiras e nos leitos das ribeiras, tem sido prática de muitos agregados familiares cabo-verdianos. Nas últimas décadas, a praia de Calhetona (Ilha de Santiago) foi um dos muitos locais que sofreram degradação ambiental significativa, devido à realização desta actividade sem quaisquer planos de extracção e de posterior recuperação das áreas degradadas. Este trabalho, através da conjugação de recolha de dados por inquérito, observação directa e pesquisa documental e bibliográfica, teve como objectivos a caracterização da comunidade (que habita no bairro de Ponta Calhetona) que se dedica à extracção de areia na praia de Calhetona, a descrição da dinâmica da actividade extractiva, a avaliação da percepção que a comunidade tem relativamente às consequências da sua actividade e a descrição do impacte ambiental resultante da extracção de areia. Da análise dos inquéritos, efectuados em Fevereiro de 2012, a 25 chefes de agregados familiares que efectuam a extracção de areia na praia de Calhetona, constata-se que estes são maioritariamente mulheres, predominantemente com idade compreendida entre os 40 e os 59 anos, domésticas, com baixa escolaridade, com famílias numerosas e/ou alargadas a seu cargo e dedicando-se à extração de areia à mais de 10 anos. Os inquiridos, face à situação de vulnerabilidade económica, à falta de emprego e à grande procura de areia para a construção civil, vêem nesta actividade uma fonte de rendimento. Contudo, o proveito obtido desta actividade difícil e potencialmente perigosa é reduzido. Quem efectivamente beneficia são os camionistas que compram a areia a quem procede à extracção e a vendem ao consumidor final pelo dobro do preço. Os inquiridos demonstram uma consciência generalizada dos diversos impactes ambientais negativos resultantes da sua actividade, mas alegam que a extracção de areia é uma das poucas alternativas existentes para providenciar o sustento dos seus agregados familiares. Com base na comparação do estado actual da praia de Calhetona com relatos de habitantes locais, relativos às características da mesma no passado, verifica-se que nos últimos 40-50 anos, desde que se iniciou a intensa extracção de areia nesta praia, o seu aspecto físico se degradou claramente. Essa degradação caracteriza-se principalmente pelo recuo da linha de costa, pela quase ausência de areia e pela salinização dos solos localizados nas proximidades da praia, para além dos consequentes impactes negativos sobre a desova das tartarugas e o turismo balnear.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Intensification of agricultural production without a sound management and regulations can lead to severe environmental problems, as in Western Santa Catarina State, Brazil, where intensive swine production has caused large accumulations of manure and consequently water pollution. Natural resource scientists are asked by decision-makers for advice on management and regulatory decisions. Distributed environmental models are useful tools, since they can be used to explore consequences of various management practices. However, in many areas of the world, quantitative data for model calibration and validation are lacking. The data-intensive distributed environmental model AgNPS was applied in a data-poor environment, the upper catchment (2,520 ha) of the Ariranhazinho River, near the city of Seara, in Santa Catarina State. Steps included data preparation, cell size selection, sensitivity analysis, model calibration and application to different management scenarios. The model was calibrated based on a best guess for model parameters and on a pragmatic sensitivity analysis. The parameters were adjusted to match model outputs (runoff volume, peak runoff rate and sediment concentration) closely with the sparse observed data. A modelling grid cell resolution of 150 m adduced appropriate and computer-fit results. The rainfall runoff response of the AgNPS model was calibrated using three separate rainfall ranges (< 25, 25-60, > 60 mm). Predicted sediment concentrations were consistently six to ten times higher than observed, probably due to sediment trapping along vegetated channel banks. Predicted N and P concentrations in stream water ranged from just below to well above regulatory norms. Expert knowledge of the area, in addition to experience reported in the literature, was able to compensate in part for limited calibration data. Several scenarios (actual, recommended and excessive manure applications, and point source pollution from swine operations) could be compared by the model, using a relative ranking rather than quantitative predictions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Tests for bioaccessibility are useful in human health risk assessment. No research data with the objective of determining bioaccessible arsenic (As) in areas affected by gold mining and smelting activities have been published so far in Brazil. Samples were collected from four areas: a private natural land reserve of Cerrado; mine tailings; overburden; and refuse from gold smelting of a mining company in Paracatu, Minas Gerais. The total, bioaccessible and Mehlich-1-extractable As levels were determined. Based on the reproducibility and the accuracy/precision of the in vitro gastrointestinal (IVG) determination method of bioaccessible As in the reference material NIST 2710, it was concluded that this procedure is adequate to determine bioaccessible As in soil and tailing samples from gold mining areas in Brazil. All samples from the studied mining area contained low percentages of bioaccessible As.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Digital information generates the possibility of a high degree of redundancy in the data available for fitting predictive models used for Digital Soil Mapping (DSM). Among these models, the Decision Tree (DT) technique has been increasingly applied due to its capacity of dealing with large datasets. The purpose of this study was to evaluate the impact of the data volume used to generate the DT models on the quality of soil maps. An area of 889.33 km² was chosen in the Northern region of the State of Rio Grande do Sul. The soil-landscape relationship was obtained from reambulation of the studied area and the alignment of the units in the 1:50,000 scale topographic mapping. Six predictive covariates linked to the factors soil formation, relief and organisms, together with data sets of 1, 3, 5, 10, 15, 20 and 25 % of the total data volume, were used to generate the predictive DT models in the data mining program Waikato Environment for Knowledge Analysis (WEKA). In this study, sample densities below 5 % resulted in models with lower power of capturing the complexity of the spatial distribution of the soil in the study area. The relation between the data volume to be handled and the predictive capacity of the models was best for samples between 5 and 15 %. For the models based on these sample densities, the collected field data indicated an accuracy of predictive mapping close to 70 %.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Expression data contribute significantly to the biological value of the sequenced human genome, providing extensive information about gene structure and the pattern of gene expression. ESTs, together with SAGE libraries and microarray experiment information, provide a broad and rich view of the transcriptome. However, it is difficult to perform large-scale expression mining of the data generated by these diverse experimental approaches. Not only is the data stored in disparate locations, but there is frequent ambiguity in the meaning of terms used to describe the source of the material used in the experiment. Untangling semantic differences between the data provided by different resources is therefore largely reliant on the domain knowledge of a human expert. We present here eVOC, a system which associates labelled target cDNAs for microarray experiments, or cDNA libraries and their associated transcripts with controlled terms in a set of hierarchical vocabularies. eVOC consists of four orthogonal controlled vocabularies suitable for describing the domains of human gene expression data including Anatomical System, Cell Type, Pathology and Developmental Stage. We have curated and annotated 7016 cDNA libraries represented in dbEST, as well as 104 SAGE libraries,with expression information,and provide this as an integrated, public resource that allows the linking of transcripts and libraries with expression terms. Both the vocabularies and the vocabulary-annotated libraries can be retrieved from http://www.sanbi.ac.za/evoc/. Several groups are involved in developing this resource with the aim of unifying transcript expression information.