802 resultados para Data stream mining
Resumo:
Hydrologic transport of dissolved organic carbon (DOC) from peat soils may differ to organo-mineral soils in how they responded to changes in flow, because of differences in soil profile and hydrology. In well-drained organo-mineral soils, low flow is through the lower mineral layer where DOC is absorbed and high flow is through the upper organic layer where DOC is produced. DOC concentrations in streams draining organo-mineral soils typically increase with flow. In saturated peat soils, both high and low flows are through an organic layer where DOC is produced. Therefore, DOC in stream water draining peat may not increase in response to changes in flow as there is no switch in flow path between a mineral and organic layer. To verify this, we conducted a high-resolution monitoring study of soil and stream water at an upland peat catchment in northern England. Our data showed a strong positive correlation between DOC concentrations at − 1 and − 5 cm depth and stream water, and weaker correlations between concentrations at − 20 to − 50 cm depth and stream water. Although near surface organic material appears to be the key source of stream water DOC in both peat and organo-mineral soils, we observed a negative correlation between stream flow and DOC concentrations instead of a positive correlation as DOC released from organic layers during low and high flow was diluted by rainfall. The differences in DOC transport processes between peat and organo-mineral soils have different implications for our understanding of long-term changes in DOC exports. While increased rainfall may cause an increase in DOC flux from peat due to an increase in water volume, it may cause a decrease in concentrations. This response is contrary to expected changes in DOC exports from organo-mineral soils, where increase rainfall is likely to result in an increase in flux and concentration.
Resumo:
The genetic analysis workshop 15 (GAW15) problem 1 contained baseline expression levels of 8793 genes in immortalised B cells from 194 individuals in 14 Centre d’Etude du Polymorphisme Humane (CEPH) Utah pedigrees. Previous analysis of the data showed linkage and association and evidence of substantial individual variations. In particular, correlation was examined on expression levels of 31 genes and 25 target genes corresponding to two master regulatory regions. In this analysis, we apply Bayesian network analysis to gain further insight into these findings. We identify strong dependences and therefore provide additional insight into the underlying relationships between the genes involved. More generally, the approach is expected to be applicable for integrated analysis of genes on biological pathways.
Resumo:
In a world of almost permanent and rapidly increasing electronic data availability, techniques of filtering, compressing, and interpreting this data to transform it into valuable and easily comprehensible information is of utmost importance. One key topic in this area is the capability to deduce future system behavior from a given data input. This book brings together for the first time the complete theory of data-based neurofuzzy modelling and the linguistic attributes of fuzzy logic in a single cohesive mathematical framework. After introducing the basic theory of data-based modelling, new concepts including extended additive and multiplicative submodels are developed and their extensions to state estimation and data fusion are derived. All these algorithms are illustrated with benchmark and real-life examples to demonstrate their efficiency. Chris Harris and his group have carried out pioneering work which has tied together the fields of neural networks and linguistic rule-based algortihms. This book is aimed at researchers and scientists in time series modeling, empirical data modeling, knowledge discovery, data mining, and data fusion.
Resumo:
Aircraft Maintenance, Repair and Overhaul (MRO) feedback commonly includes an engineer’s complex text-based inspection report. Capturing and normalizing the content of these textual descriptions is vital to cost and quality benchmarking, and provides information to facilitate continuous improvement of MRO process and analytics. As data analysis and mining tools requires highly normalized data, raw textual data is inadequate. This paper offers a textual-mining solution to efficiently analyse bulk textual feedback data. Despite replacement of the same parts and/or sub-parts, the actual service cost for the same repair is often distinctly different from similar previously jobs. Regular expression algorithms were incorporated with an aircraft MRO glossary dictionary in order to help provide additional information concerning the reason for cost variation. Professional terms and conventions were included within the dictionary to avoid ambiguity and improve the outcome of the result. Testing results show that most descriptive inspection reports can be appropriately interpreted, allowing extraction of highly normalized data. This additional normalized data strongly supports data analysis and data mining, whilst also increasing the accuracy of future quotation costing. This solution has been effectively used by a large aircraft MRO agency with positive results.
Resumo:
This paper examines two hydrochemical time-series derived from stream samples taken in the Upper Hafren catchment, Plynlimon, Wales. One time-series comprises data collected at 7-hour intervals over 22 months (Neal et al., submitted, this issue), while the other is based on weekly sampling over 20 years. A subset of determinands: aluminium, calcium, chloride, conductivity, dissolved organic carbon, iron, nitrate, pH, silicon and sulphate are examined within a framework of non-stationary time-series analysis to identify determinand trends, seasonality and short-term dynamics. The results demonstrate that both long-term and high-frequency monitoring provide valuable and unique insights into the hydrochemistry of a catchment. The long-term data allowed analysis of long-termtrends, demonstrating continued increases in DOC concentrations accompanied by declining SO4 concentrations within the stream, and provided new insights into the changing amplitude and phase of the seasonality of the determinands such as DOC and Al. Additionally, these data proved invaluable for placing the short-term variability demonstrated within the high-frequency data within context. The 7-hour data highlighted complex diurnal cycles for NO3, Ca and Fe with cycles displaying changes in phase and amplitude on a seasonal basis. The high-frequency data also demonstrated the need to consider the impact that the time of sample collection can have on the summary statistics of the data and also that sampling during the hours of darkness provides additional hydrochemical information for determinands which exhibit pronounced diurnal variability. Moving forward, this research demonstrates the need for both long-term and high-frequency monitoring to facilitate a full and accurate understanding of catchment hydrochemical dynamics.
Resumo:
There is growing interest in the ways in which the location of a person can be utilized by new applications and services. Recent advances in mobile technologies have meant that the technical capability to record and transmit location data for processing is appearing in off-the-shelf handsets. This opens possibilities to profile people based on the places they visit, people they associate with, or other aspects of their complex routines determined through persistent tracking. It is possible that services offering customized information based on the results of such behavioral profiling could become commonplace. However, it may not be immediately apparent to the user that a wealth of information about them, potentially unrelated to the service, can be revealed. Further issues occur if the user agreed, while subscribing to the service, for data to be passed to third parties where it may be used to their detriment. Here, we report in detail on a short case study tracking four people, in three European member states, persistently for six weeks using mobile handsets. The GPS locations of these people have been mined to reveal places of interest and to create simple profiles. The information drawn from the profiling activity ranges from intuitive through special cases to insightful. In this paper, these results and further extensions to the technology are considered in light of European legislation to assess the privacy implications of this emerging technology.
Resumo:
Basic Network transactions specifies that datagram from source to destination is routed through numerous routers and paths depending on the available free and uncongested paths which results in the transmission route being too long, thus incurring greater delay, jitter, congestion and reduced throughput. One of the major problems of packet switched networks is the cell delay variation or jitter. This cell delay variation is due to the queuing delay depending on the applied loading conditions. The effect of delay, jitter accumulation due to the number of nodes along transmission routes and dropped packets adds further complexity to multimedia traffic because there is no guarantee that each traffic stream will be delivered according to its own jitter constraints therefore there is the need to analyze the effects of jitter. IP routers enable a single path for the transmission of all packets. On the other hand, Multi-Protocol Label Switching (MPLS) allows separation of packet forwarding and routing characteristics to enable packets to use the appropriate routes and also optimize and control the behavior of transmission paths. Thus correcting some of the shortfalls associated with IP routing. Therefore MPLS has been utilized in the analysis for effective transmission through the various networks. This paper analyzes the effect of delay, congestion, interference, jitter and packet loss in the transmission of signals from source to destination. In effect the impact of link failures, repair paths in the various physical topologies namely bus, star, mesh and hybrid topologies are all analyzed based on standard network conditions.
Resumo:
A weekly programme of water quality monitoring has been conducted by Slapton Ley Field Centre since 1970. Samples have been collected for the four main streams draining into Slapton Ley, from the Ley itself and from other sites within the catchment. On occasions, more frequent sampling has been undertaken during short-term research projects, usually in relation to nutrient export from the catchment. These water quality data, unparalleled in length for a series of small drainage basins in the British Isles, provide a unique resource for analysis of spatial and temporal variations in stream water quality within an agricultural area. Not surprisingly, given the eutrophic status of the Ley, most attention has focused on the nutrients nitrate and phosphate. A number of approaches to modelling nutrient loss have been attempted, including time series analysis and the application of nutrient export and physically-based models.
Resumo:
This article explores the contribution that artisanal and small-scale mining (ASM) makes to poverty reduction in Tanzania, based on data on gold and diamond mining in Mwanza Region. The evidence suggests that people working in mining or related services are less likely to be in poverty than those with other occupations. However, the picture is complex; while mining income can help reduce poverty and provide a buffer from livelihood shocks, peoples inability to obtain a formal mineral claim, or to effectively exploit their claims, contributes to insecurity. This is reinforced by a context in which ASM is peripheral to large-scale mining interests, is only gradually being addressed within national poverty reduction policies, and is segregated from district-level planning.
Resumo:
Global communicationrequirements andloadimbalanceof someparalleldataminingalgorithms arethe major obstacles to exploitthe computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication costin parallel data mining algorithms and, in particular, in the k-means algorithm for cluster analysis. In the straightforward parallel formulation of the k-means algorithm, data and computation loads are uniformly distributed over the processing nodes. This approach has excellent load balancing characteristics that may suggest it could scale up to large and extreme-scale parallel computing systems. However, at each iteration step the algorithm requires a global reduction operationwhichhinders thescalabilityoftheapproach.Thisworkstudiesadifferentparallelformulation of the algorithm where the requirement of global communication is removed, while maintaining the same deterministic nature ofthe centralised algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real-world distributed applications or can be induced by means ofmulti-dimensional binary searchtrees. The approachcanalso be extended to accommodate an approximation error which allows a further reduction ofthe communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing element
Resumo:
The use of ageostrophic flow to infer the presence of vertical circulations in the entrances and exits of the climatological jet streams is questioned. Problems of interpretation arise because of the use of different definitions of geostrophy in theoretical studies and in analyses of atmospheric data. The nature and role of the ageostrophic flow based on constant and variable Coriolis parameter definitions of geostrophy vary. In the latter the geostrophic divergence cannot be neglected, so the vertical motion is not associated solely with the ageostrophic flow. Evidence is presented suggesting that ageostrophic flow in the climatological jet streams is primarily determined by the kinematic requirements of wave retrogression rather than by a forcing process. These requirements are largely met by the rotational flow, with the divergent circulations present being geostrophically forced, and so playing a secondary, restoring role.
Resumo:
In the context of environmental valuation of natural disasters, an important component of the evaluation procedure lies in determining the periodicity of events. This paper explores alternative methodologies for determining such periodicity, illustrating the advantages and the disadvantages of the separate methods and their comparative predictions. The procedures employ Bayesian inference and explore recent advances in computational aspects of mixtures methodology. The procedures are applied to the classic data set of Maguire et al (Biometrika, 1952) which was subsequently updated by Jarrett (Biometrika, 1979) and which comprise the seminal investigations examining the periodicity of mining disasters within the United Kingdom, 1851-1962.
Resumo:
Predicting the future response of the Antarctic Ice Sheet to climate change requires an understanding of the ice streams that dominate its dynamics. Here we use cosmogenic isotope exposure-age dating (26Al, 10Be and 36Cl) of erratic boulders on ice-free land on James Ross Island, north-eastern Antarctic Peninsula, to define the evolution of Last Glacial Maximum (LGM) ice in the adjacent Prince Gustav Channel. These data include ice-sheet extent, thickness and dynamical behaviour. Prior to ∼18 ka, the LGM Antarctic Peninsula Ice Sheet extended to the continental shelf-edge and transported erratic boulders onto high-elevation mesas on James Ross Island. After ∼18 ka there was a period of rapid ice-sheet surface-lowering, coincident with the initiation of the Prince Gustav Ice Stream. This timing coincided with rapid increases in atmospheric temperature and eustatic sea-level rise around the Antarctic Peninsula. Collectively, these data provide evidence for a transition from a thick, cold-based LGM Antarctic Peninsula Ice Sheet to a thinner, partially warm-based ice sheet during deglaciation.
Resumo:
Mechanistic catchment-scale phosphorus models appear to perform poorly where diffuse sources dominate. We investigate the reasons for this for one model, INCA-P, testing model output against 18 months of daily data in a small Scottish catchment. We examine key model processes and provide recommendations for model improvement and simplification. Improvements to the particulate phosphorus simulation are especially needed. The model evaluation procedure is then generalised to provide a checklist for identifying why model performance may be poor or unreliable, incorporating calibration, data, structural and conceptual challenges. There needs to be greater recognition that current models struggle to produce positive Nash–Sutcliffe statistics in agricultural catchments when evaluated against daily data. Phosphorus modelling is difficult, but models are not as useless as this might suggest. We found a combination of correlation coefficients, bias, a comparison of distributions and a visual assessment of time series a better means of identifying realistic simulations.
Resumo:
Highly heterogeneous mountain snow distributions strongly affect soil moisture patterns; local ecology; and, ultimately, the timing, magnitude, and chemistry of stream runoff. Capturing these vital heterogeneities in a physically based distributed snow model requires appropriately scaled model structures. This work looks at how model scale—particularly the resolutions at which the forcing processes are represented—affects simulated snow distributions and melt. The research area is in the Reynolds Creek Experimental Watershed in southwestern Idaho. In this region, where there is a negative correlation between snow accumulation and melt rates, overall scale degradation pushed simulated melt to earlier in the season. The processes mainly responsible for snow distribution heterogeneity in this region—wind speed, wind-affected snow accumulations, thermal radiation, and solar radiation—were also independently rescaled to test process-specific spatiotemporal sensitivities. It was found that in order to accurately simulate snowmelt in this catchment, the snow cover needed to be resolved to 100 m. Wind and wind-affected precipitation—the primary influence on snow distribution—required similar resolution. Thermal radiation scaled with the vegetation structure (~100 m), while solar radiation was adequately modeled with 100–250-m resolution. Spatiotemporal sensitivities to model scale were found that allowed for further reductions in computational costs through the winter months with limited losses in accuracy. It was also shown that these modeling-based scale breaks could be associated with physiographic and vegetation structures to aid a priori modeling decisions.