127 resultados para Data fusion applications
Resumo:
Pocket Data Mining (PDM) is our new term describing collaborative mining of streaming data in mobile and distributed computing environments. With sheer amounts of data streams are now available for subscription on our smart mobile phones, the potential of using this data for decision making using data stream mining techniques has now been achievable owing to the increasing power of these handheld devices. Wireless communication among these devices using Bluetooth and WiFi technologies has opened the door wide for collaborative mining among the mobile devices within the same range that are running data mining techniques targeting the same application. This paper proposes a new architecture that we have prototyped for realizing the significant applications in this area. We have proposed using mobile software agents in this application for several reasons. Most importantly the autonomic intelligent behaviour of the agent technology has been the driving force for using it in this application. Other efficiency reasons are discussed in details in this paper. Experimental results showing the feasibility of the proposed architecture are presented and discussed.
Resumo:
Collaborative mining of distributed data streams in a mobile computing environment is referred to as Pocket Data Mining PDM. Hoeffding trees techniques have been experimentally and analytically validated for data stream classification. In this paper, we have proposed, developed and evaluated the adoption of distributed Hoeffding trees for classifying streaming data in PDM applications. We have identified a realistic scenario in which different users equipped with smart mobile devices run a local Hoeffding tree classifier on a subset of the attributes. Thus, we have investigated the mining of vertically partitioned datasets with possible overlap of attributes, which is the more likely case. Our experimental results have validated the efficiency of our proposed model achieving promising accuracy for real deployment.
Resumo:
Distributed and collaborative data stream mining in a mobile computing environment is referred to as Pocket Data Mining PDM. Large amounts of available data streams to which smart phones can subscribe to or sense, coupled with the increasing computational power of handheld devices motivates the development of PDM as a decision making system. This emerging area of study has shown to be feasible in an earlier study using technological enablers of mobile software agents and stream mining techniques [1]. A typical PDM process would start by having mobile agents roam the network to discover relevant data streams and resources. Then other (mobile) agents encapsulating stream mining techniques visit the relevant nodes in the network in order to build evolving data mining models. Finally, a third type of mobile agents roam the network consulting the mining agents for a final collaborative decision, when required by one or more users. In this paper, we propose the use of distributed Hoeffding trees and Naive Bayes classifers in the PDM framework over vertically partitioned data streams. Mobile policing, health monitoring and stock market analysis are among the possible applications of PDM. An extensive experimental study is reported showing the effectiveness of the collaborative data mining with the two classifers.
Resumo:
Pocket Data Mining (PDM) describes the full process of analysing data streams in mobile ad hoc distributed environments. Advances in mobile devices like smart phones and tablet computers have made it possible for a wide range of applications to run in such an environment. In this paper, we propose the adoption of data stream classification techniques for PDM. Evident by a thorough experimental study, it has been proved that running heterogeneous/different, or homogeneous/similar data stream classification techniques over vertically partitioned data (data partitioned according to the feature space) results in comparable performance to batch and centralised learning techniques.
Resumo:
Advances in hardware and software in the past decade allow to capture, record and process fast data streams at a large scale. The research area of data stream mining has emerged as a consequence from these advances in order to cope with the real time analysis of potentially large and changing data streams. Examples of data streams include Google searches, credit card transactions, telemetric data and data of continuous chemical production processes. In some cases the data can be processed in batches by traditional data mining approaches. However, in some applications it is required to analyse the data in real time as soon as it is being captured. Such cases are for example if the data stream is infinite, fast changing, or simply too large in size to be stored. One of the most important data mining techniques on data streams is classification. This involves training the classifier on the data stream in real time and adapting it to concept drifts. Most data stream classifiers are based on decision trees. However, it is well known in the data mining community that there is no single optimal algorithm. An algorithm may work well on one or several datasets but badly on others. This paper introduces eRules, a new rule based adaptive classifier for data streams, based on an evolving set of Rules. eRules induces a set of rules that is constantly evaluated and adapted to changes in the data stream by adding new and removing old rules. It is different from the more popular decision tree based classifiers as it tends to leave data instances rather unclassified than forcing a classification that could be wrong. The ongoing development of eRules aims to improve its accuracy further through dynamic parameter setting which will also address the problem of changing feature domain values.
Resumo:
Airborne lidar provides accurate height information of objects on the earth and has been recognized as a reliable and accurate surveying tool in many applications. In particular, lidar data offer vital and significant features for urban land-cover classification, which is an important task in urban land-use studies. In this article, we present an effective approach in which lidar data fused with its co-registered images (i.e. aerial colour images containing red, green and blue (RGB) bands and near-infrared (NIR) images) and other derived features are used effectively for accurate urban land-cover classification. The proposed approach begins with an initial classification performed by the Dempster–Shafer theory of evidence with a specifically designed basic probability assignment function. It outputs two results, i.e. the initial classification and pseudo-training samples, which are selected automatically according to the combined probability masses. Second, a support vector machine (SVM)-based probability estimator is adopted to compute the class conditional probability (CCP) for each pixel from the pseudo-training samples. Finally, a Markov random field (MRF) model is established to combine spatial contextual information into the classification. In this stage, the initial classification result and the CCP are exploited. An efficient belief propagation (EBP) algorithm is developed to search for the global minimum-energy solution for the maximum a posteriori (MAP)-MRF framework in which three techniques are developed to speed up the standard belief propagation (BP) algorithm. Lidar and its co-registered data acquired by Toposys Falcon II are used in performance tests. The experimental results prove that fusing the height data and optical images is particularly suited for urban land-cover classification. There is no training sample needed in the proposed approach, and the computational cost is relatively low. An average classification accuracy of 93.63% is achieved.
Resumo:
The assimilation of measurements from the stratosphere and mesosphere is becoming increasingly common as the lids of weather prediction and climate models rise into the mesosphere and thermosphere. However, the dynamics of the middle atmosphere pose specific challenges to the assimilation of measurements from this region. Forecast-error variances can be very large in the mesosphere and this can render assimilation schemes very sensitive to the details of the specification of forecast error correlations. An example is shown where observations in the stratosphere are able to produce increments in the mesosphere. Such sensitivity of the assimilation scheme to misspecification of covariances can also amplify any existing biases in measurements or forecasts. Since both models and measurements of the middle atmosphere are known to have biases, the separation of these sources of bias remains a issue. Finally, well-known deficiencies of assimilation schemes, such as the production of imbalanced states or the assumption of zero bias, are proposed explanations for the inaccurate transport resulting from assimilated winds. The inability of assimilated winds to accurately transport constituents in the middle atmosphere remains a fundamental issue limiting the use of assimilated products for applications involving longer time-scales.
Resumo:
Global communicationrequirements andloadimbalanceof someparalleldataminingalgorithms arethe major obstacles to exploitthe computational power of large-scale systems. This work investigates how non-uniform data distributions can be exploited to remove the global communication requirement and to reduce the communication costin parallel data mining algorithms and, in particular, in the k-means algorithm for cluster analysis. In the straightforward parallel formulation of the k-means algorithm, data and computation loads are uniformly distributed over the processing nodes. This approach has excellent load balancing characteristics that may suggest it could scale up to large and extreme-scale parallel computing systems. However, at each iteration step the algorithm requires a global reduction operationwhichhinders thescalabilityoftheapproach.Thisworkstudiesadifferentparallelformulation of the algorithm where the requirement of global communication is removed, while maintaining the same deterministic nature ofthe centralised algorithm. The proposed approach exploits a non-uniform data distribution which can be either found in real-world distributed applications or can be induced by means ofmulti-dimensional binary searchtrees. The approachcanalso be extended to accommodate an approximation error which allows a further reduction ofthe communication costs. The effectiveness of the exact and approximate methods has been tested in a parallel computing system with 64 processors and in simulations with 1024 processing element
Resumo:
A method of classifying the upper tropospheric/lower stratospheric (UTLS) jets has been developed that allows satellite and aircraft trace gas data and meteorological fields to be efficiently mapped in a jet coordinate view. A detailed characterization of multiple tropopauses accompanies the jet characterization. Jet climatologies show the well-known high altitude subtropical and lower altitude polar jets in the upper troposphere, as well as a pattern of concentric polar and subtropical jets in the Southern Hemisphere, and shifts of the primary jet to high latitudes associated with blocking ridges in Northern Hemisphere winter. The jet-coordinate view segregates air masses differently than the commonly-used equivalent latitude (EqL) coordinate throughout the lowermost stratosphere and in the upper troposphere. Mapping O3 data from the Aura Microwave Limb Sounder (MLS) satellite and the Winter Storms aircraft datasets in jet coordinates thus emphasizes different aspects of the circulation compared to an EqL-coordinate framework: the jet coordinate reorders the data geometrically, thus highlighting the strong PV, tropopause height and trace gas gradients across the subtropical jet, whereas EqL is a dynamical coordinate that may blur these spatial relationships but provides information on irreversible transport. The jet coordinate view identifies the concentration of stratospheric ozone well below the tropopause in the region poleward of and below the jet core, as well as other transport features associated with the upper tropospheric jets. Using the jet information in EqL coordinates allows us to study trace gas distributions in regions of weak versus strong jets, and demonstrates weaker transport barriers in regions with less jet influence. MLS and Atmospheric Chemistry Experiment-Fourier Transform Spectrometer trace gas fields for spring 2008 in jet coordinates show very strong, closely correlated, PV, tropopause height and trace gas gradients across the jet, and evidence of intrusions of stratospheric air below the tropopause below and poleward of the subtropical jet; these features are consistent between instruments and among multiple trace gases. Our characterization of the jets is facilitating studies that will improve our understanding of upper tropospheric trace gas evolution.
Resumo:
It is well known that there is a dynamic relationship between cerebral blood flow (CBF) and cerebral blood volume (CBV). With increasing applications of functional MRI, where the blood oxygen-level-dependent signals are recorded, the understanding and accurate modeling of the hemodynamic relationship between CBF and CBV becomes increasingly important. This study presents an empirical and data-based modeling framework for model identification from CBF and CBV experimental data. It is shown that the relationship between the changes in CBF and CBV can be described using a parsimonious autoregressive with exogenous input model structure. It is observed that neither the ordinary least-squares (LS) method nor the classical total least-squares (TLS) method can produce accurate estimates from the original noisy CBF and CBV data. A regularized total least-squares (RTLS) method is thus introduced and extended to solve such an error-in-the-variables problem. Quantitative results show that the RTLS method works very well on the noisy CBF and CBV data. Finally, a combination of RTLS with a filtering method can lead to a parsimonious but very effective model that can characterize the relationship between the changes in CBF and CBV.
Resumo:
n the past decade, the analysis of data has faced the challenge of dealing with very large and complex datasets and the real-time generation of data. Technologies to store and access these complex and large datasets are in place. However, robust and scalable analysis technologies are needed to extract meaningful information from these datasets. The research field of Information Visualization and Visual Data Analytics addresses this need. Information visualization and data mining are often used complementary to each other. Their common goal is the extraction of meaningful information from complex and possibly large data. However, though data mining focuses on the usage of silicon hardware, visualization techniques also aim to access the powerful image-processing capabilities of the human brain. This article highlights the research on data visualization and visual analytics techniques. Furthermore, we highlight existing visual analytics techniques, systems, and applications including a perspective on the field from the chemical process industry.
Resumo:
The absorption coefficient of a substance distributed as discrete particles in suspension is less than that of the same material dissolved uniformly in a medium—a phenomenon commonly referred to as the flattening effect. The decrease in the absorption coefficient owing to flattening effect depends on the concentration of the absorbing pigment inside the particle, the specific absorption coefficient of the pigment within the particle, and on the diameter of the particle, if the particles are assumed to be spherical. For phytoplankton cells in the ocean, with diameters ranging from less than 1 µm to more than 100 µm, the flattening effect is variable, and sometimes pronounced, as has been well documented in the literature. Here, we demonstrate how the in vivo absorption coefficient of phytoplankton cells per unit concentration of its major pigment, chlorophyll a, can be used to determine the average cell size of the phytoplankton population. Sensitivity analyses are carried out to evaluate the errors in the estimated diameter owing to potential errors in the model assumptions. Cell sizes computed for field samples using the model are compared qualitatively with indirect estimates of size classes derived from high performance liquid chromatography data. Also, the results are compared quantitatively against measurements of cell size in laboratory cultures. The method developed is easy-to-apply as an operational tool for in situ observations, and has the potential for application to remote sensing of ocean colour data.
Resumo:
Aim Earth observation (EO) products are a valuable alternative to spectral vegetation indices. We discuss the availability of EO products for analysing patterns in macroecology, particularly related to vegetation, on a range of spatial and temporal scales. Location Global. Methods We discuss four groups of EO products: land cover/cover change, vegetation structure and ecosystem productivity, fire detection, and digital elevation models. We address important practical issues arising from their use, such as assumptions underlying product generation, product accuracy and product transferability between spatial scales. We investigate the potential of EO products for analysing terrestrial ecosystems. Results Land cover, productivity and fire products are generated from long-term data using standardized algorithms to improve reliability in detecting change of land surfaces. Their global coverage renders them useful for macroecology. Their spatial resolution (e.g. GLOBCOVER vegetation, 300 m; MODIS vegetation and fire, ≥ 500 m; ASTER digital elevation, 30 m) can be a limiting factor. Canopy structure and productivity products are based on physical approaches and thus are independent of biome-specific calibrations. Active fire locations are provided in near-real time, while burnt area products show actual area burnt by fire. EO products can be assimilated into ecosystem models, and their validation information can be employed to calculate uncertainties during subsequent modelling. Main conclusions Owing to their global coverage and long-term continuity, EO end products can significantly advance the field of macroecology. EO products allow analyses of spatial biodiversity, seasonal dynamics of biomass and productivity, and consequences of disturbances on regional to global scales. Remaining drawbacks include inter-operability between products from different sensors and accuracy issues due to differences between assumptions and models underlying the generation of different EO products. Our review explains the nature of EO products and how they relate to particular ecological variables across scales to encourage their wider use in ecological applications.
Resumo:
To bridge the gaps between traditional mesoscale modelling and microscale modelling, the National Center for Atmospheric Research, in collaboration with other agencies and research groups, has developed an integrated urban modelling system coupled to the weather research and forecasting (WRF) model as a community tool to address urban environmental issues. The core of this WRF/urban modelling system consists of the following: (1) three methods with different degrees of freedom to parameterize urban surface processes, ranging from a simple bulk parameterization to a sophisticated multi-layer urban canopy model with an indoor–outdoor exchange sub-model that directly interacts with the atmospheric boundary layer, (2) coupling to fine-scale computational fluid dynamic Reynolds-averaged Navier–Stokes and Large-Eddy simulation models for transport and dispersion (T&D) applications, (3) procedures to incorporate high-resolution urban land use, building morphology, and anthropogenic heating data using the National Urban Database and Access Portal Tool (NUDAPT), and (4) an urbanized high-resolution land data assimilation system. This paper provides an overview of this modelling system; addresses the daunting challenges of initializing the coupled WRF/urban model and of specifying the potentially vast number of parameters required to execute the WRF/urban model; explores the model sensitivity to these urban parameters; and evaluates the ability of WRF/urban to capture urban heat islands, complex boundary-layer structures aloft, and urban plume T&D for several major metropolitan regions. Recent applications of this modelling system illustrate its promising utility, as a regional climate-modelling tool, to investigate impacts of future urbanization on regional meteorological conditions and on air quality under future climate change scenarios. Copyright © 2010 Royal Meteorological Society
Resumo:
This paper introduces a novel approach for free-text keystroke dynamics authentication which incorporates the use of the keyboard’s key-layout. The method extracts timing features from specific key-pairs. The Euclidean distance is then utilized to find the level of similarity between a user’s profile data and his/her test data. The results obtained from this method are reasonable for free-text authentication while maintaining the maximum level of user relaxation. Moreover, it has been proven in this study that flight time yields better authentication results when compared with dwell time. In particular, the results were obtained with only one training sample for the purpose of practicality and ease of real life application.