8 resultados para Data stream mining

em Duke University


Relevância:

80.00% 80.00%

Publicador:

Resumo:

We use an information-theoretic method developed by Neifeld and Lee [J. Opt. Soc. Am. A 25, C31 (2008)] to analyze the performance of a slow-light system. Slow-light is realized in this system via stimulated Brillouin scattering in a 2 km-long, room-temperature, highly nonlinear fiber pumped by a laser whose spectrum is tailored and broadened to 5 GHz. We compute the information throughput (IT), which quantifies the fraction of information transferred from the source to the receiver and the information delay (ID), which quantifies the delay of a data stream at which the information transfer is largest, for a range of experimental parameters. We also measure the eye-opening (EO) and signal-to-noise ratio (SNR) of the transmitted data stream and find that they scale in a similar fashion to the information-theoretic method. Our experimental findings are compared to a model of the slow-light system that accounts for all pertinent noise sources in the system as well as data-pulse distortion due to the filtering effect of the SBS process. The agreement between our observations and the predictions of our model is very good. Furthermore, we compare measurements of the IT for an optimal flattop gain profile and for a Gaussian-shaped gain profile. For a given pump-beam power, we find that the optimal profile gives a 36% larger ID and somewhat higher IT compared to the Gaussian profile. Specifically, the optimal (Gaussian) profile produces a fractional slow-light ID of 0.94 (0.69) and an IT of 0.86 (0.86) at a pump-beam power of 450 mW and a data rate of 2.5 Gbps. Thus, the optimal profile better utilizes the available pump-beam power, which is often a valuable resource in a system design.

Relevância:

40.00% 40.00%

Publicador:

Resumo:

BACKGROUND: Over the past two decades more than fifty thousand unique clinical and biological samples have been assayed using the Affymetrix HG-U133 and HG-U95 GeneChip microarray platforms. This substantial repository has been used extensively to characterize changes in gene expression between biological samples, but has not been previously mined en masse for changes in mRNA processing. We explored the possibility of using HG-U133 microarray data to identify changes in alternative mRNA processing in several available archival datasets. RESULTS: Data from these and other gene expression microarrays can now be mined for changes in transcript isoform abundance using a program described here, SplicerAV. Using in vivo and in vitro breast cancer microarray datasets, SplicerAV was able to perform both gene and isoform specific expression profiling within the same microarray dataset. Our reanalysis of Affymetrix U133 plus 2.0 data generated by in vitro over-expression of HRAS, E2F3, beta-catenin (CTNNB1), SRC, and MYC identified several hundred oncogene-induced mRNA isoform changes, one of which recognized a previously unknown mechanism of EGFR family activation. Using clinical data, SplicerAV predicted 241 isoform changes between low and high grade breast tumors; with changes enriched among genes coding for guanyl-nucleotide exchange factors, metalloprotease inhibitors, and mRNA processing factors. Isoform changes in 15 genes were associated with aggressive cancer across the three breast cancer datasets. CONCLUSIONS: Using SplicerAV, we identified several hundred previously uncharacterized isoform changes induced by in vitro oncogene over-expression and revealed a previously unknown mechanism of EGFR activation in human mammary epithelial cells. We analyzed Affymetrix GeneChip data from over 400 human breast tumors in three independent studies, making this the largest clinical dataset analyzed for en masse changes in alternative mRNA processing. The capacity to detect RNA isoform changes in archival microarray data using SplicerAV allowed us to carry out the first analysis of isoform specific mRNA changes directly associated with cancer survival.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

BACKGROUND: The inherent complexity of statistical methods and clinical phenomena compel researchers with diverse domains of expertise to work in interdisciplinary teams, where none of them have a complete knowledge in their counterpart's field. As a result, knowledge exchange may often be characterized by miscommunication leading to misinterpretation, ultimately resulting in errors in research and even clinical practice. Though communication has a central role in interdisciplinary collaboration and since miscommunication can have a negative impact on research processes, to the best of our knowledge, no study has yet explored how data analysis specialists and clinical researchers communicate over time. METHODS/PRINCIPAL FINDINGS: We conducted qualitative analysis of encounters between clinical researchers and data analysis specialists (epidemiologist, clinical epidemiologist, and data mining specialist). These encounters were recorded and systematically analyzed using a grounded theory methodology for extraction of emerging themes, followed by data triangulation and analysis of negative cases for validation. A policy analysis was then performed using a system dynamics methodology looking for potential interventions to improve this process. Four major emerging themes were found. Definitions using lay language were frequently employed as a way to bridge the language gap between the specialties. Thought experiments presented a series of "what if" situations that helped clarify how the method or information from the other field would behave, if exposed to alternative situations, ultimately aiding in explaining their main objective. Metaphors and analogies were used to translate concepts across fields, from the unfamiliar to the familiar. Prolepsis was used to anticipate study outcomes, thus helping specialists understand the current context based on an understanding of their final goal. CONCLUSION/SIGNIFICANCE: The communication between clinical researchers and data analysis specialists presents multiple challenges that can lead to errors.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Mountaintop mining (MTM) is the primary procedure for surface coal exploration within the central Appalachian region of the eastern United States, and it is known to contaminate streams in local watersheds. In this study, we measured the chemical and isotopic compositions of water samples from MTM-impacted tributaries and streams in the Mud River watershed in West Virginia. We systematically document the isotopic compositions of three major constituents: sulfur isotopes in sulfate (δ(34)SSO4), carbon isotopes in dissolved inorganic carbon (δ(13)CDIC), and strontium isotopes ((87)Sr/(86)Sr). The data show that δ(34)SSO4, δ(13)CDIC, Sr/Ca, and (87)Sr/(86)Sr measured in saline- and selenium-rich MTM impacted tributaries are distinguishable from those of the surface water upstream of mining impacts. These tracers can therefore be used to delineate and quantify the impact of MTM in watersheds. High Sr/Ca and low (87)Sr/(86)Sr characterize tributaries that originated from active MTM areas, while tributaries from reclaimed MTM areas had low Sr/Ca and high (87)Sr/(86)Sr. Leaching experiments of rocks from the watershed show that pyrite oxidation and carbonate dissolution control the solute chemistry with distinct (87)Sr/(86)Sr ratios characterizing different rock sources. We propose that MTM operations that access the deeper Kanawha Formation generate residual mined rocks in valley fills from which effluents with distinctive (87)Sr/(86)Sr and Sr/Ca imprints affect the quality of the Appalachian watersheds.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

An enterprise information system (EIS) is an integrated data-applications platform characterized by diverse, heterogeneous, and distributed data sources. For many enterprises, a number of business processes still depend heavily on static rule-based methods and extensive human expertise. Enterprises are faced with the need for optimizing operation scheduling, improving resource utilization, discovering useful knowledge, and making data-driven decisions.

This thesis research is focused on real-time optimization and knowledge discovery that addresses workflow optimization, resource allocation, as well as data-driven predictions of process-execution times, order fulfillment, and enterprise service-level performance. In contrast to prior work on data analytics techniques for enterprise performance optimization, the emphasis here is on realizing scalable and real-time enterprise intelligence based on a combination of heterogeneous system simulation, combinatorial optimization, machine-learning algorithms, and statistical methods.

On-demand digital-print service is a representative enterprise requiring a powerful EIS.We use real-life data from Reischling Press, Inc. (RPI), a digit-print-service provider (PSP), to evaluate our optimization algorithms.

In order to handle the increase in volume and diversity of demands, we first present a high-performance, scalable, and real-time production scheduling algorithm for production automation based on an incremental genetic algorithm (IGA). The objective of this algorithm is to optimize the order dispatching sequence and balance resource utilization. Compared to prior work, this solution is scalable for a high volume of orders and it provides fast scheduling solutions for orders that require complex fulfillment procedures. Experimental results highlight its potential benefit in reducing production inefficiencies and enhancing the productivity of an enterprise.

We next discuss analysis and prediction of different attributes involved in hierarchical components of an enterprise. We start from a study of the fundamental processes related to real-time prediction. Our process-execution time and process status prediction models integrate statistical methods with machine-learning algorithms. In addition to improved prediction accuracy compared to stand-alone machine-learning algorithms, it also performs a probabilistic estimation of the predicted status. An order generally consists of multiple series and parallel processes. We next introduce an order-fulfillment prediction model that combines advantages of multiple classification models by incorporating flexible decision-integration mechanisms. Experimental results show that adopting due dates recommended by the model can significantly reduce enterprise late-delivery ratio. Finally, we investigate service-level attributes that reflect the overall performance of an enterprise. We analyze and decompose time-series data into different components according to their hierarchical periodic nature, perform correlation analysis,

and develop univariate prediction models for each component as well as multivariate models for correlated components. Predictions for the original time series are aggregated from the predictions of its components. In addition to a significant increase in mid-term prediction accuracy, this distributed modeling strategy also improves short-term time-series prediction accuracy.

In summary, this thesis research has led to a set of characterization, optimization, and prediction tools for an EIS to derive insightful knowledge from data and use them as guidance for production management. It is expected to provide solutions for enterprises to increase reconfigurability, accomplish more automated procedures, and obtain data-driven recommendations or effective decisions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Selenium (Se) is a micronutrient necessary for the function of a variety of important enzymes; Se also exhibits a narrow range in concentrations between essentiality and toxicity. Oviparous vertebrates such as birds and fish are especially sensitive to Se toxicity, which causes reproductive impairment and defects in embryo development. Selenium occurs naturally in the Earth's crust, but it can be mobilized by a variety of anthropogenic activities, including agricultural practices, coal burning, and mining.

Mountaintop removal/valley fill (MTR/VF) coal mining is a form of surface mining found throughout central Appalachia in the United States that involves blasting off the tops of mountains to access underlying coal seams. Spoil rock from the mountain is placed into adjacent valleys, forming valley fills, which bury stream headwaters and negatively impact surface water quality. This research focused on the biological impacts of Se leached from MTR/VF coal mining operations located around the Mud River, West Virginia.

In order to assess the status of Se in a lotic (flowing) system such as the Mud River, surface water, insects, and fish samples including creek chub (Semotilus atromaculatus) and green sunfish (Lepomis cyanellus) were collected from a mining impacted site as well as from a reference site not impacted by mining. Analysis of samples from the mined site showed increased conductivity and Se in the surface waters compared to the reference site in addition to increased concentrations of Se in insects and fish. Histological analysis of mined site fish gills showed a lack of normal parasites, suggesting parasite populations may be disrupted due to poor water quality. X-ray absorption near edge spectroscopy techniques were used to determine the speciation of Se in insect and creek chub samples. Insects contained approximately 40-50% inorganic Se (selenate and selenite) and 50-60% organic Se (Se-methionine and Se-cystine) while fish tissues contained lower proportions of inorganic Se than insects, instead having higher proportions of organic Se in the forms of methyl-Se-cysteine, Se-cystine, and Se-methionine.

Otoliths, calcified inner ear structures, were also collected from Mud River creek chubs and green sunfish and analyzed for Se content using laser ablation inductively couple mass spectrometry (LA-ICP-MS). Significant differences were found between the two species of fish, based on the concentrations of otolith Se. Green sunfish otoliths from all sites contained background or low concentrations of otolith Se (< 1 µg/g) that were not significantly different between mined and unmined sites. In contrast creek chub otoliths from the historically mined site contained much higher (≥ 5 µg/g, up to approximately 68 µg/g) concentrations of Se than for the same species in the unmined site or for the green sunfish. Otolith Se concentrations were related to muscle Se concentrations for creek chubs (R2 = 0.54, p = 0.0002 for the last 20% of the otolith Se versus muscle Se) while no relationship was observed for green sunfish.

Additional experiments using biofilms grown in the Mud River showed increased Se in mined site biofilms compared to the reference site. When we fed fathead minnows (Pimephales promelas) on these biofilms in the laboratory they accumulated higher concentrations of Se in liver and ovary tissues compared to fathead minnows fed on reference site biofilms. No differences in Se accumulation were found in muscle from either treatment group. Biofilms were also centrifuged and separated into filamentous green algae and the remaining diatom fraction. The majority of Se was found in the diatom fraction with only about 1/3rd of total biofilm Se concentration present in the filamentous green algae fraction

Finally, zebrafish (Danio rerio) embryos were exposed to aqueous Se in the form of selenate, selenite, and L-selenomethionine in an attempt to determine if oxidative stress plays a role in selenium embryo toxicity. Selenate and selenite exposure did not induce embryo deformities (lordosis and craniofacial malformation). L-selenomethionine, however, induced significantly higher deformity rates at 100 µg/L compared to controls. Antioxidant rescue of L-selenomethionime induced deformities was attempted in embryos using N-acetylcysteine (NAC). Pretreatment with NAC significantly reduced deformities in the zebrafish embryos secondarily treated with L-selenomethionine, suggesting that oxidative stress may play a role in Se toxicity. Selenite exposure also induced a 6.6-fold increase in glutathione-S-transferase pi class 2 gene expression, which is involved in xenobiotic transformation. No changes in gene expression were observed for selenate or L-selenomethionine-exposed embryos.

The findings in this dissertation contribute to the understanding of how Se bioaccumulates in a lotic system and is transferred through a simulated foodweb in addition to further exploring oxidative stress as a potential mechanism for Se-induced embryo toxicity. Future studies should continue to pursue the role of oxidative stress and other mechanisms in Se toxicity and the biotransformation of Se in aquatic ecosystems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Many factors such as poverty, ineffective institutions and environmental regulations may prevent developing countries from managing how natural resources are extracted to meet a strong market demand. Extraction for some resources has reached such proportions that evidence is measurable from space. We present recent evidence of the global demand for a single commodity and the ecosystem destruction resulting from commodity extraction, recorded by satellites for one of the most biodiverse areas of the world. We find that since 2003, recent mining deforestation in Madre de Dios, Peru is increasing nonlinearly alongside a constant annual rate of increase in international gold price (∼18%/yr). We detect that the new pattern of mining deforestation (1915 ha/year, 2006-2009) is outpacing that of nearby settlement deforestation. We show that gold price is linked with exponential increases in Peruvian national mercury imports over time (R(2) = 0.93, p = 0.04, 2003-2009). Given the past rates of increase we predict that mercury imports may more than double for 2011 (∼500 t/year). Virtually all of Peru's mercury imports are used in artisanal gold mining. Much of the mining increase is unregulated/artisanal in nature, lacking environmental impact analysis or miner education. As a result, large quantities of mercury are being released into the atmosphere, sediments and waterways. Other developing countries endowed with gold deposits are likely experiencing similar environmental destruction in response to recent record high gold prices. The increasing availability of satellite imagery ought to evoke further studies linking economic variables with land use and cover changes on the ground.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The application of semantic technologies to the integration of biological data and the interoperability of bioinformatics analysis and visualization tools has been the common theme of a series of annual BioHackathons hosted in Japan for the past five years. Here we provide a review of the activities and outcomes from the BioHackathons held in 2011 in Kyoto and 2012 in Toyama. In order to efficiently implement semantic technologies in the life sciences, participants formed various sub-groups and worked on the following topics: Resource Description Framework (RDF) models for specific domains, text mining of the literature, ontology development, essential metadata for biological databases, platforms to enable efficient Semantic Web technology development and interoperability, and the development of applications for Semantic Web data. In this review, we briefly introduce the themes covered by these sub-groups. The observations made, conclusions drawn, and software development projects that emerged from these activities are discussed.