802 resultados para Data stream mining
Resumo:
BACKGROUND: The inherent complexity of statistical methods and clinical phenomena compel researchers with diverse domains of expertise to work in interdisciplinary teams, where none of them have a complete knowledge in their counterpart's field. As a result, knowledge exchange may often be characterized by miscommunication leading to misinterpretation, ultimately resulting in errors in research and even clinical practice. Though communication has a central role in interdisciplinary collaboration and since miscommunication can have a negative impact on research processes, to the best of our knowledge, no study has yet explored how data analysis specialists and clinical researchers communicate over time. METHODS/PRINCIPAL FINDINGS: We conducted qualitative analysis of encounters between clinical researchers and data analysis specialists (epidemiologist, clinical epidemiologist, and data mining specialist). These encounters were recorded and systematically analyzed using a grounded theory methodology for extraction of emerging themes, followed by data triangulation and analysis of negative cases for validation. A policy analysis was then performed using a system dynamics methodology looking for potential interventions to improve this process. Four major emerging themes were found. Definitions using lay language were frequently employed as a way to bridge the language gap between the specialties. Thought experiments presented a series of "what if" situations that helped clarify how the method or information from the other field would behave, if exposed to alternative situations, ultimately aiding in explaining their main objective. Metaphors and analogies were used to translate concepts across fields, from the unfamiliar to the familiar. Prolepsis was used to anticipate study outcomes, thus helping specialists understand the current context based on an understanding of their final goal. CONCLUSION/SIGNIFICANCE: The communication between clinical researchers and data analysis specialists presents multiple challenges that can lead to errors.
Resumo:
Mountaintop mining (MTM) is the primary procedure for surface coal exploration within the central Appalachian region of the eastern United States, and it is known to contaminate streams in local watersheds. In this study, we measured the chemical and isotopic compositions of water samples from MTM-impacted tributaries and streams in the Mud River watershed in West Virginia. We systematically document the isotopic compositions of three major constituents: sulfur isotopes in sulfate (δ(34)SSO4), carbon isotopes in dissolved inorganic carbon (δ(13)CDIC), and strontium isotopes ((87)Sr/(86)Sr). The data show that δ(34)SSO4, δ(13)CDIC, Sr/Ca, and (87)Sr/(86)Sr measured in saline- and selenium-rich MTM impacted tributaries are distinguishable from those of the surface water upstream of mining impacts. These tracers can therefore be used to delineate and quantify the impact of MTM in watersheds. High Sr/Ca and low (87)Sr/(86)Sr characterize tributaries that originated from active MTM areas, while tributaries from reclaimed MTM areas had low Sr/Ca and high (87)Sr/(86)Sr. Leaching experiments of rocks from the watershed show that pyrite oxidation and carbonate dissolution control the solute chemistry with distinct (87)Sr/(86)Sr ratios characterizing different rock sources. We propose that MTM operations that access the deeper Kanawha Formation generate residual mined rocks in valley fills from which effluents with distinctive (87)Sr/(86)Sr and Sr/Ca imprints affect the quality of the Appalachian watersheds.
Resumo:
An enterprise information system (EIS) is an integrated data-applications platform characterized by diverse, heterogeneous, and distributed data sources. For many enterprises, a number of business processes still depend heavily on static rule-based methods and extensive human expertise. Enterprises are faced with the need for optimizing operation scheduling, improving resource utilization, discovering useful knowledge, and making data-driven decisions.
This thesis research is focused on real-time optimization and knowledge discovery that addresses workflow optimization, resource allocation, as well as data-driven predictions of process-execution times, order fulfillment, and enterprise service-level performance. In contrast to prior work on data analytics techniques for enterprise performance optimization, the emphasis here is on realizing scalable and real-time enterprise intelligence based on a combination of heterogeneous system simulation, combinatorial optimization, machine-learning algorithms, and statistical methods.
On-demand digital-print service is a representative enterprise requiring a powerful EIS.We use real-life data from Reischling Press, Inc. (RPI), a digit-print-service provider (PSP), to evaluate our optimization algorithms.
In order to handle the increase in volume and diversity of demands, we first present a high-performance, scalable, and real-time production scheduling algorithm for production automation based on an incremental genetic algorithm (IGA). The objective of this algorithm is to optimize the order dispatching sequence and balance resource utilization. Compared to prior work, this solution is scalable for a high volume of orders and it provides fast scheduling solutions for orders that require complex fulfillment procedures. Experimental results highlight its potential benefit in reducing production inefficiencies and enhancing the productivity of an enterprise.
We next discuss analysis and prediction of different attributes involved in hierarchical components of an enterprise. We start from a study of the fundamental processes related to real-time prediction. Our process-execution time and process status prediction models integrate statistical methods with machine-learning algorithms. In addition to improved prediction accuracy compared to stand-alone machine-learning algorithms, it also performs a probabilistic estimation of the predicted status. An order generally consists of multiple series and parallel processes. We next introduce an order-fulfillment prediction model that combines advantages of multiple classification models by incorporating flexible decision-integration mechanisms. Experimental results show that adopting due dates recommended by the model can significantly reduce enterprise late-delivery ratio. Finally, we investigate service-level attributes that reflect the overall performance of an enterprise. We analyze and decompose time-series data into different components according to their hierarchical periodic nature, perform correlation analysis,
and develop univariate prediction models for each component as well as multivariate models for correlated components. Predictions for the original time series are aggregated from the predictions of its components. In addition to a significant increase in mid-term prediction accuracy, this distributed modeling strategy also improves short-term time-series prediction accuracy.
In summary, this thesis research has led to a set of characterization, optimization, and prediction tools for an EIS to derive insightful knowledge from data and use them as guidance for production management. It is expected to provide solutions for enterprises to increase reconfigurability, accomplish more automated procedures, and obtain data-driven recommendations or effective decisions.
Resumo:
Selenium (Se) is a micronutrient necessary for the function of a variety of important enzymes; Se also exhibits a narrow range in concentrations between essentiality and toxicity. Oviparous vertebrates such as birds and fish are especially sensitive to Se toxicity, which causes reproductive impairment and defects in embryo development. Selenium occurs naturally in the Earth's crust, but it can be mobilized by a variety of anthropogenic activities, including agricultural practices, coal burning, and mining.
Mountaintop removal/valley fill (MTR/VF) coal mining is a form of surface mining found throughout central Appalachia in the United States that involves blasting off the tops of mountains to access underlying coal seams. Spoil rock from the mountain is placed into adjacent valleys, forming valley fills, which bury stream headwaters and negatively impact surface water quality. This research focused on the biological impacts of Se leached from MTR/VF coal mining operations located around the Mud River, West Virginia.
In order to assess the status of Se in a lotic (flowing) system such as the Mud River, surface water, insects, and fish samples including creek chub (Semotilus atromaculatus) and green sunfish (Lepomis cyanellus) were collected from a mining impacted site as well as from a reference site not impacted by mining. Analysis of samples from the mined site showed increased conductivity and Se in the surface waters compared to the reference site in addition to increased concentrations of Se in insects and fish. Histological analysis of mined site fish gills showed a lack of normal parasites, suggesting parasite populations may be disrupted due to poor water quality. X-ray absorption near edge spectroscopy techniques were used to determine the speciation of Se in insect and creek chub samples. Insects contained approximately 40-50% inorganic Se (selenate and selenite) and 50-60% organic Se (Se-methionine and Se-cystine) while fish tissues contained lower proportions of inorganic Se than insects, instead having higher proportions of organic Se in the forms of methyl-Se-cysteine, Se-cystine, and Se-methionine.
Otoliths, calcified inner ear structures, were also collected from Mud River creek chubs and green sunfish and analyzed for Se content using laser ablation inductively couple mass spectrometry (LA-ICP-MS). Significant differences were found between the two species of fish, based on the concentrations of otolith Se. Green sunfish otoliths from all sites contained background or low concentrations of otolith Se (< 1 µg/g) that were not significantly different between mined and unmined sites. In contrast creek chub otoliths from the historically mined site contained much higher (≥ 5 µg/g, up to approximately 68 µg/g) concentrations of Se than for the same species in the unmined site or for the green sunfish. Otolith Se concentrations were related to muscle Se concentrations for creek chubs (R2 = 0.54, p = 0.0002 for the last 20% of the otolith Se versus muscle Se) while no relationship was observed for green sunfish.
Additional experiments using biofilms grown in the Mud River showed increased Se in mined site biofilms compared to the reference site. When we fed fathead minnows (Pimephales promelas) on these biofilms in the laboratory they accumulated higher concentrations of Se in liver and ovary tissues compared to fathead minnows fed on reference site biofilms. No differences in Se accumulation were found in muscle from either treatment group. Biofilms were also centrifuged and separated into filamentous green algae and the remaining diatom fraction. The majority of Se was found in the diatom fraction with only about 1/3rd of total biofilm Se concentration present in the filamentous green algae fraction
Finally, zebrafish (Danio rerio) embryos were exposed to aqueous Se in the form of selenate, selenite, and L-selenomethionine in an attempt to determine if oxidative stress plays a role in selenium embryo toxicity. Selenate and selenite exposure did not induce embryo deformities (lordosis and craniofacial malformation). L-selenomethionine, however, induced significantly higher deformity rates at 100 µg/L compared to controls. Antioxidant rescue of L-selenomethionime induced deformities was attempted in embryos using N-acetylcysteine (NAC). Pretreatment with NAC significantly reduced deformities in the zebrafish embryos secondarily treated with L-selenomethionine, suggesting that oxidative stress may play a role in Se toxicity. Selenite exposure also induced a 6.6-fold increase in glutathione-S-transferase pi class 2 gene expression, which is involved in xenobiotic transformation. No changes in gene expression were observed for selenate or L-selenomethionine-exposed embryos.
The findings in this dissertation contribute to the understanding of how Se bioaccumulates in a lotic system and is transferred through a simulated foodweb in addition to further exploring oxidative stress as a potential mechanism for Se-induced embryo toxicity. Future studies should continue to pursue the role of oxidative stress and other mechanisms in Se toxicity and the biotransformation of Se in aquatic ecosystems.
Resumo:
Many factors such as poverty, ineffective institutions and environmental regulations may prevent developing countries from managing how natural resources are extracted to meet a strong market demand. Extraction for some resources has reached such proportions that evidence is measurable from space. We present recent evidence of the global demand for a single commodity and the ecosystem destruction resulting from commodity extraction, recorded by satellites for one of the most biodiverse areas of the world. We find that since 2003, recent mining deforestation in Madre de Dios, Peru is increasing nonlinearly alongside a constant annual rate of increase in international gold price (∼18%/yr). We detect that the new pattern of mining deforestation (1915 ha/year, 2006-2009) is outpacing that of nearby settlement deforestation. We show that gold price is linked with exponential increases in Peruvian national mercury imports over time (R(2) = 0.93, p = 0.04, 2003-2009). Given the past rates of increase we predict that mercury imports may more than double for 2011 (∼500 t/year). Virtually all of Peru's mercury imports are used in artisanal gold mining. Much of the mining increase is unregulated/artisanal in nature, lacking environmental impact analysis or miner education. As a result, large quantities of mercury are being released into the atmosphere, sediments and waterways. Other developing countries endowed with gold deposits are likely experiencing similar environmental destruction in response to recent record high gold prices. The increasing availability of satellite imagery ought to evoke further studies linking economic variables with land use and cover changes on the ground.
Resumo:
The application of semantic technologies to the integration of biological data and the interoperability of bioinformatics analysis and visualization tools has been the common theme of a series of annual BioHackathons hosted in Japan for the past five years. Here we provide a review of the activities and outcomes from the BioHackathons held in 2011 in Kyoto and 2012 in Toyama. In order to efficiently implement semantic technologies in the life sciences, participants formed various sub-groups and worked on the following topics: Resource Description Framework (RDF) models for specific domains, text mining of the literature, ontology development, essential metadata for biological databases, platforms to enable efficient Semantic Web technology development and interoperability, and the development of applications for Semantic Web data. In this review, we briefly introduce the themes covered by these sub-groups. The observations made, conclusions drawn, and software development projects that emerged from these activities are discussed.
Resumo:
The purpose of this paper is to demonstrate a technique to utilize underground mine drift profile data for estimating absolute roughness of an underground mine drift in order to implement the Darcy-Weisbach equation for mine ventilation calculations. This technique could provide mine ventilation engineers with more accurate information upon which they might base their ventilation systems designs. This paper presents preliminary work suggesting that it is possible to estimate the absolute roughness of drift-like tunnels by analyzing profile data (e.g., collected using a scanning laser rangefinder). The absolute roughness is then used to estimate the friction factor employed in the Darcy-Weisbach equation. The presented technique is based on an analysis of the spectral characteristics of profile ranges. Simulations based on real mine data are provided to illustrate the potential viability of this method. It is shown that mining drift roughness profiles appear similar to Gaussian profiles
Resumo:
We describe an approach aimed at addressing the issue of joint exploitation of control (stream) and data parallelism in a skeleton based parallel programming environment, based on annotations and refactoring. Annotations drive efficient implementation of a parallel computation. Refactoring is used to transform the associated skeleton tree into a more efficient, functionally equivalent skeleton tree. In most cases, cost models are used to drive the refactoring process. We show how sample use case applications/kernels may be optimized and discuss preliminary experiments with FastFlow assessing the theoretical results. © 2013 Springer-Verlag.
Resumo:
The study outlined in Testing Tidal Turbines Part 1 explains the variation in performance between turbines operating in steady and turbulent flow conditions. However, the impact of turbulence on devices is generally not well understood. Furthermore, the turbulence characteristics of high velocity marine currents have not been extensively studied. Therefore, knowledge of their characteristics must be expanded and methodologies to predict the impact of the characteristics on devices developed and improved. This study examines the measurement of tidal currents at a site used for testing of medium scale tidal turbines. The data being discussed was collected with a point velocimeter (ADV). The processing procedures implemented are discussed and the resulting estimated turbulence spectra and turbulence intensities are presented. The results contribute to the improvement of knowledge regarding tidal current characteristics. This will be fundamental to the optimisation of the design and operation of tidal stream devices.
Resumo:
The adulteration of extra virgin olive oil with other vegetable oils is a certain problem with economic and health consequences. Current official methods have been proved insufficient to detect such adulterations. One of the most concerning and undetectable adulterations with other vegetable oils is the addition of hazelnut oil. The main objective of this work was to develop a novel dimensionality reduction technique able to model oil mixtures as a part of an integrated pattern recognition solution. This final solution attempts to identify hazelnut oil adulterants in extra virgin olive oil at low percentages based on spectroscopic chemical fingerprints. The proposed Continuous Locality Preserving Projections (CLPP) technique allows the modelling of the continuous nature of the produced in house admixtures as data series instead of discrete points. This methodology has potential to be extended to other mixtures and adulterations of food products. The maintenance of the continuous structure of the data manifold lets the better visualization of this examined classification problem and facilitates a more accurate utilisation of the manifold for detecting the adulterants.
Resumo:
In this paper we propose a graph stream clustering algorithm with a unied similarity measure on both structural and attribute properties of vertices, with each attribute being treated as a vertex. Unlike others, our approach does not require an input parameter for the number of clusters, instead, it dynamically creates new sketch-based clusters and periodically merges existing similar clusters. Experiments on two publicly available datasets reveal the advantages of our approach in detecting vertex clusters in the graph stream. We provide a detailed investigation into how parameters affect the algorithm performance. We also provide a quantitative evaluation and comparison with a well-known offline community detection algorithm which shows that our streaming algorithm can achieve comparable or better average cluster purity.
Resumo:
Promoter hypermethylation is central in deregulating gene expression in cancer. Identification of novel methylation targets in specific cancers provides a basis for their use as biomarkers of disease occurrence and progression. We developed an in silico strategy to globally identify potential targets of promoter hypermethylation in prostate cancer by screening for 5' CpG islands in 631 genes that were reported as downregulated in prostate cancer. A virtual archive of 338 potential targets of methylation was produced. One candidate, IGFBP3, was selected for investigation, along with glutathione-S-transferase pi (GSTP1), a well-known methylation target in prostate cancer. Methylation of IGFBP3 was detected by quantitative methylation-specific PCR in 49/79 primary prostate adenocarcinoma and 7/14 adjacent preinvasive high-grade prostatic intraepithelial neoplasia, but in only 5/37 benign prostatic hyperplasia (P < 0.0001) and in 0/39 histologically normal adjacent prostate tissue, which implies that methylation of IGFBP3 may be involved in the early stages of prostate cancer development. Hypermethylation of IGFBP3 was only detected in samples that also demonstrated methylation of GSTP1 and was also correlated with Gleason score > or =7 (P=0.01), indicating that it has potential as a prognostic marker. In addition, pharmacological demethylation induced strong expression of IGFBP3 in LNCaP prostate cancer cells. Our concept of a methylation candidate gene bank was successful in identifying a novel target of frequent hypermethylation in early-stage prostate cancer. Evaluation of further relevant genes could contribute towards a methylation signature of this disease.
Resumo:
Inland waters are of global biogeochemical importance receiving carbon inputs of ~ 4.8 Pg C y-1. Of this 12 % is buried, 18 % transported to the oceans, and 70 % supports aquatic secondary production. However, the mechanisms that determine the fate of organic matter (OM) in these systems are poorly defined. One important aspect is the formation of organo-mineral complexes in aquatic systems and their potential as a route for OM transport and burial vs. their use potential as organic carbon (C) and nitrogen (N) sources. Organo-mineral particles form by sorption of dissolved OM to freshly eroded mineral surfaces and may contribute to ecosystem-scale particulate OM fluxes. We tested the availability of mineral-sorbed OM as a C & N source for streamwater microbial assemblages and streambed biofilms. Organo-mineral particles were constructed in vitro by sorption of 13C:15N-labelled amino acids to hydrated kaolin particles, and microbial degradation of these particles compared with equivalent doses of 13C:15N-labelled free amino acids. Experiments were conducted in 120 ml mesocosms over 7 days using biofilms and streamwater sampled from the Oberer Seebach stream (Austria), tracing assimilation and mineralization of 13C and 15N labels from mineral-sorbed and dissolved amino acids.Here we present data on the effects of organo-mineral sorption upon amino acid mineralization and its C:N stoichiometry. Organo-mineral sorption had a significant effect upon microbial activity, restricting C and N mineralization by both the biofilm and streamwater treatments. Distinct differences in community response were observed, with both dissolved and mineral-stabilized amino acids playing an enhanced role in the metabolism of the streamwater microbial community. Mineral-sorption of amino acids differentially affected C & N mineralization and reduced the C:N ratio of the dissolved amino acid pool. The present study demonstrates that organo-mineral complexes restrict microbial degradation of OM and may, consequently, alter the carbon and nitrogen cycling dynamics within aquatic ecosystems.