845 resultados para mining data streams


Relevância:

30.00% 30.00%

Publicador:

Resumo:

A report produced by the Department of Natural Resources on the historical pattern the rivers take.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Part of Iowa's Water Ambient monitoring Program, produced by the Iowa Department of Natural Resources.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Recently, kernel-based Machine Learning methods have gained great popularity in many data analysis and data mining fields: pattern recognition, biocomputing, speech and vision, engineering, remote sensing etc. The paper describes the use of kernel methods to approach the processing of large datasets from environmental monitoring networks. Several typical problems of the environmental sciences and their solutions provided by kernel-based methods are considered: classification of categorical data (soil type classification), mapping of environmental and pollution continuous information (pollution of soil by radionuclides), mapping with auxiliary information (climatic data from Aral Sea region). The promising developments, such as automatic emergency hot spot detection and monitoring network optimization are discussed as well.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

This report describes a statewide study conducted to develop main-channel slope (MCS) curves for 138 selected streams in Iowa with drainage areas greater than 100 square miles. MCS values determined from the curves can be used in regression equations for estimating flood frequency discharges. Multi-variable regression equations previously developed for two of the three hydrologic regions defined for Iowa require the measurement of MCS. Main-channel slope is a difficult measurement to obtain for large streams using 1:24,000-scale topographic maps. The curves developed in this report provide a simplified method for determining MCS values for sites located along large streams in Iowa within hydrologic Regions 2 and 3. The curves were developed using MCS values quantified for 2,058 selected sites along 138 selected streams in Iowa. A geographic information system (GIS) technique and 1:24,000-scale topographic data were used to quantify MCS values for the stream sites. The sites were selected at about 5-mile intervals along the streams. River miles were quantified for each stream site using a GIS program. Data points for river-mile and MCS values were plotted and a best-fit curve was developed for each stream. An adjustment was applied to all 138 curves to compensate for differences in MCS values between manual measurements and GIS quantification. The multi-variable equations for Regions 2 and 3 were developed using manual measurements of MCS. A comparison of manual measurements and GIS quantification of MCS indicates that manual measurements typically produce greater values of MCS compared to GIS quantification. Median differences between manual measurements and GIS quantification of MCS are 14.8 and 17.7 percent for Regions 2 and 3, respectively. Comparisons of percentage differences between flood-frequency discharges calculated using MCS values of manual measurements and GIS quantification indicate that use of GIS values of MCS for Region 3 substantially underestimate flood discharges. Mean and median percentage differences for 2- to 500-year recurrence-interval flood discharges ranged from 5.0 to 5.3 and 4.3 to 4.5 percent, respectively, for Region 2 and ranged from 18.3 to 27.1 and 12.3 to 17.3 percent for Region 3. The MCS curves developed from GIS quantification were adjusted by 14.8 percent for streams located in Region 2 and by 17.7 percent for streams located in Region 3. Comparisons of percentage differences between flood discharges calculated using MCS values of manual measurements and adjusted-GIS quantification for Regions 2 and 3 indicate that the flood-discharge estimates are comparable. For Region 2, mean percentage differences for 2- to 500-year recurrence-interval flood discharges ranged between 0.6 and 0.8 percent and median differences were 0.0 percent. For Region 3, mean and median differences ranged between 5.4 to 8.4 and 0.0 to 0.3 percent, respectively. A list of selected stream sites presented with each curve provides information about the sites including river miles, drainage areas, the location of U.S. Geological Survey stream flowgage stations, and the location of streams Abstract crossing hydro logic region boundaries or the Des Moines Lobe landforms region boundary. Two examples are presented for determining river-mile and MCS values, and two techniques are presented for computing flood-frequency discharges.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Tests for bioaccessibility are useful in human health risk assessment. No research data with the objective of determining bioaccessible arsenic (As) in areas affected by gold mining and smelting activities have been published so far in Brazil. Samples were collected from four areas: a private natural land reserve of Cerrado; mine tailings; overburden; and refuse from gold smelting of a mining company in Paracatu, Minas Gerais. The total, bioaccessible and Mehlich-1-extractable As levels were determined. Based on the reproducibility and the accuracy/precision of the in vitro gastrointestinal (IVG) determination method of bioaccessible As in the reference material NIST 2710, it was concluded that this procedure is adequate to determine bioaccessible As in soil and tailing samples from gold mining areas in Brazil. All samples from the studied mining area contained low percentages of bioaccessible As.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Digital information generates the possibility of a high degree of redundancy in the data available for fitting predictive models used for Digital Soil Mapping (DSM). Among these models, the Decision Tree (DT) technique has been increasingly applied due to its capacity of dealing with large datasets. The purpose of this study was to evaluate the impact of the data volume used to generate the DT models on the quality of soil maps. An area of 889.33 km² was chosen in the Northern region of the State of Rio Grande do Sul. The soil-landscape relationship was obtained from reambulation of the studied area and the alignment of the units in the 1:50,000 scale topographic mapping. Six predictive covariates linked to the factors soil formation, relief and organisms, together with data sets of 1, 3, 5, 10, 15, 20 and 25 % of the total data volume, were used to generate the predictive DT models in the data mining program Waikato Environment for Knowledge Analysis (WEKA). In this study, sample densities below 5 % resulted in models with lower power of capturing the complexity of the spatial distribution of the soil in the study area. The relation between the data volume to be handled and the predictive capacity of the models was best for samples between 5 and 15 %. For the models based on these sample densities, the collected field data indicated an accuracy of predictive mapping close to 70 %.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Expression data contribute significantly to the biological value of the sequenced human genome, providing extensive information about gene structure and the pattern of gene expression. ESTs, together with SAGE libraries and microarray experiment information, provide a broad and rich view of the transcriptome. However, it is difficult to perform large-scale expression mining of the data generated by these diverse experimental approaches. Not only is the data stored in disparate locations, but there is frequent ambiguity in the meaning of terms used to describe the source of the material used in the experiment. Untangling semantic differences between the data provided by different resources is therefore largely reliant on the domain knowledge of a human expert. We present here eVOC, a system which associates labelled target cDNAs for microarray experiments, or cDNA libraries and their associated transcripts with controlled terms in a set of hierarchical vocabularies. eVOC consists of four orthogonal controlled vocabularies suitable for describing the domains of human gene expression data including Anatomical System, Cell Type, Pathology and Developmental Stage. We have curated and annotated 7016 cDNA libraries represented in dbEST, as well as 104 SAGE libraries,with expression information,and provide this as an integrated, public resource that allows the linking of transcripts and libraries with expression terms. Both the vocabularies and the vocabulary-annotated libraries can be retrieved from http://www.sanbi.ac.za/evoc/. Several groups are involved in developing this resource with the aim of unifying transcript expression information.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Drainage-basin and channel-geometry multiple-regression equations are presented for estimating design-flood discharges having recurrence intervals of 2, 5, 10, 25, 50, and 100 years at stream sites on rural, unregulated streams in Iowa. Design-flood discharge estimates determined by Pearson Type-III analyses using data collected through the 1990 water year are reported for the 188 streamflow-gaging stations used in either the drainage-basin or channel-geometry regression analyses. Ordinary least-squares multiple-regression techniques were used to identify selected drainage-basin and channel-geometry regions. Weighted least-squares multiple-regression techniques, which account for differences in the variance of flows at different gaging stations and for variable lengths in station records, were used to estimate the regression parameters. Statewide drainage-basin equations were developed from analyses of 164 streamflow-gaging stations. Drainage-basin characteristics were quantified using a geographic-information-system (GIS) procedure to process topographic maps and digital cartographic data. The significant characteristics identified for the drainage-basin equations included contributing drainage area, relative relief, drainage frequency, and 2-year, 24-hour precipitation intensity. The average standard errors of prediction for the drainage-basin equations ranged from 38.6% to 50.2%. The GIS procedure expanded the capability to quantitatively relate drainage-basin characteristics to the magnitude and frequency of floods for stream sites in Iowa and provides a flood-estimation method that is independent of hydrologic regionalization. Statewide and regional channel-geometry regression equations were developed from analyses of 157 streamflow-gaging stations. Channel-geometry characteristics were measured on site and on topographic maps. Statewide and regional channel-geometry regression equations that are dependent on whether a stream has been channelized were developed on the basis of bankfull and active-channel characteristics. The significant channel-geometry characteristics identified for the statewide and regional regression equations included bankfull width and bankfull depth for natural channels unaffected by channelization, and active-channel width for stabilized channels affected by channelization. The average standard errors of prediction ranged from 41.0% to 68.4% for the statewide channel-geometry equations and from 30.3% to 70.0% for the regional channel-geometry equations. Procedures provided for applying the drainage-basin and channel-geometry regression equations depend on whether the design-flood discharge estimate is for a site on an ungaged stream, an ungaged site on a gaged stream, or a gaged site. When both a drainage-basin and a channel-geometry regression-equation estimate are available for a stream site, a procedure is presented for determining a weighted average of the two flood estimates.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An automatic system was designed to concurrently measure stage and discharge for the purpose of developing stage-discharge ratings and high flow hydrographs on small streams. Stage, or gage height, is recorded by an analog-to-digital recorder and discharge is determined by the constant-rate tracer-dilution method. The system measures flow above a base stage set by the user. To test the effectiveness of the system and its components, eight systems, with a variety of equipment, were installed at crest-stage gaging stations across Iowa. A fluorescent dye, rhodamine-WT, was used as the tracer. Tracer-dilution discharge measurements were made during 14 flow periods at six stations from 1986 through 1988 water years. Ratings were developed at three stations with the aid of these measurements. A loop rating was identified at one station during rapidly-changing flow conditions. Incomplete mixing and dye loss to sediment apparently were problems at some stations. Stage hydrographs were recorded for 38 flows at seven stations. Limited data on background fluorescence during high flows were also obtained.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The broad aim of biomedical science in the postgenomic era is to link genomic and phenotype information to allow deeper understanding of the processes leading from genomic changes to altered phenotype and disease. The EuroPhenome project (http://www.EuroPhenome.org) is a comprehensive resource for raw and annotated high-throughput phenotyping data arising from projects such as EUMODIC. EUMODIC is gathering data from the EMPReSSslim pipeline (http://www.empress.har.mrc.ac.uk/) which is performed on inbred mouse strains and knock-out lines arising from the EUCOMM project. The EuroPhenome interface allows the user to access the data via the phenotype or genotype. It also allows the user to access the data in a variety of ways, including graphical display, statistical analysis and access to the raw data via web services. The raw phenotyping data captured in EuroPhenome is annotated by an annotation pipeline which automatically identifies statistically different mutants from the appropriate baseline and assigns ontology terms for that specific test. Mutant phenotypes can be quickly identified using two EuroPhenome tools: PhenoMap, a graphical representation of statistically relevant phenotypes, and mining for a mutant using ontology terms. To assist with data definition and cross-database comparisons, phenotype data is annotated using combinations of terms from biological ontologies.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: The annotation of protein post-translational modifications (PTMs) is an important task of UniProtKB curators and, with continuing improvements in experimental methodology, an ever greater number of articles are being published on this topic. To help curators cope with this growing body of information we have developed a system which extracts information from the scientific literature for the most frequently annotated PTMs in UniProtKB. RESULTS: The procedure uses a pattern-matching and rule-based approach to extract sentences with information on the type and site of modification. A ranked list of protein candidates for the modification is also provided. For PTM extraction, precision varies from 57% to 94%, and recall from 75% to 95%, according to the type of modification. The procedure was used to track new publications on PTMs and to recover potential supporting evidence for phosphorylation sites annotated based on the results of large scale proteomics experiments. CONCLUSIONS: The information retrieval and extraction method we have developed in this study forms the basis of a simple tool for the manual curation of protein post-translational modifications in UniProtKB/Swiss-Prot. Our work demonstrates that even simple text-mining tools can be effectively adapted for database curation tasks, providing that a thorough understanding of the working process and requirements are first obtained. This system can be accessed at http://eagl.unige.ch/PTM/.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The book presents the state of the art in machine learning algorithms (artificial neural networks of different architectures, support vector machines, etc.) as applied to the classification and mapping of spatially distributed environmental data. Basic geostatistical algorithms are presented as well. New trends in machine learning and their application to spatial data are given, and real case studies based on environmental and pollution data are carried out. The book provides a CD-ROM with the Machine Learning Office software, including sample sets of data, that will allow both students and researchers to put the concepts rapidly to practice.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

En aquest article es presenten breument els diferents capítols d’un treball interdisciplinari per tal d’entendre el context de prohibició de la mineria de ferro a Goa a finals del 2012 i proporcionar la informació necessària per tal d’orientar i gestionar la presa de decisions sobre l’activitat minera en un futur. Els sis primers capítols consisteixen en l’estudi del medi abiòtic, medi biòtic, fluxos de materials, aspectes socials, aspectes econòmics i finalment aspectes polítics. En canvi, en els dos últims capítols s'avaluen i es gestionen els impactes ambientals de la mineria mitjançant, per una banda, una anàlisi DPSIR i, d'altra banda, es proposen tres escenaris per integrar les diferents variables i fomentar la participació en la presa de decisions. S’ha dut a terme una extensa recerca mitjançant la recopilació de dades, entrevistes i visites a les zones d’estudi d’interès per tal d’entendre el conflicte de la mineria a Goa.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The main objective of this Master Thesis is to discover more about Girona’s image as a tourism destination from different agents’ perspective and to study its differences on promotion or opinions. In order to meet this objective, three components of Girona’s destination image will be studied: attribute-based component, the holistic component, and the affective component. It is true that a lot of research has been done about tourism destination image, but it is less when we are talking about the destination of Girona. Some studies have already focused on Girona as a tourist destination, but they used a different type of sample and different methodological steps. This study is new among destination studies in the sense that it is based only on textual online data and it follows a methodology based on text-miming. Text-mining is a kind of methodology that allows people extract relevant information from texts. Also, after this information is extracted by this methodology, some statistical multivariate analyses are done with the aim of discovering more about Girona’s tourism image

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A statewide study was conducted to develop regression equations for estimating flood-frequency discharges for ungaged stream sites in Iowa. Thirty-eight selected basin characteristics were quantified and flood-frequency analyses were computed for 291 streamflow-gaging stations in Iowa and adjacent States. A generalized-skew-coefficient analysis was conducted to determine whether generalized skew coefficients could be improved for Iowa. Station skew coefficients were computed for 239 gaging stations in Iowa and adjacent States, and an isoline map of generalized-skew-coefficient values was developed for Iowa using variogram modeling and kriging methods. The skew map provided the lowest mean square error for the generalized-skew- coefficient analysis and was used to revise generalized skew coefficients for flood-frequency analyses for gaging stations in Iowa. Regional regression analysis, using generalized least-squares regression and data from 241 gaging stations, was used to develop equations for three hydrologic regions defined for the State. The regression equations can be used to estimate flood discharges that have recurrence intervals of 2, 5, 10, 25, 50, 100, 200, and 500 years for ungaged stream sites in Iowa. One-variable equations were developed for each of the three regions and multi-variable equations were developed for two of the regions. Two sets of equations are presented for two of the regions because one-variable equations are considered easy for users to apply and the predictive accuracies of multi-variable equations are greater. Standard error of prediction for the one-variable equations ranges from about 34 to 45 percent and for the multi-variable equations range from about 31 to 42 percent. A region-of-influence regression method was also investigated for estimating flood-frequency discharges for ungaged stream sites in Iowa. A comparison of regional and region-of-influence regression methods, based on ease of application and root mean square errors, determined the regional regression method to be the better estimation method for Iowa. Techniques for estimating flood-frequency discharges for streams in Iowa are presented for determining ( 1) regional regression estimates for ungaged sites on ungaged streams; (2) weighted estimates for gaged sites; and (3) weighted estimates for ungaged sites on gaged streams. The technique for determining regional regression estimates for ungaged sites on ungaged streams requires determining which of four possible examples applies to the location of the stream site and its basin. Illustrations for determining which example applies to an ungaged stream site and for applying both the one-variable and multi-variable regression equations are provided for the estimation techniques.