135 resultados para data gathering algorithm
Resumo:
Electricity cost has become a major expense for running data centers and server consolidation using virtualization technology has been used as an important technology to improve the energy efficiency of data centers. In this research, a genetic algorithm and a simulation-annealing algorithm are proposed for the static virtual machine placement problem that considers the energy consumption in both the servers and the communication network, and a trading algorithm is proposed for dynamic virtual machine placement. Experimental results have shown that the proposed methods are more energy efficient than existing solutions.
Resumo:
The promise of ‘big data’ has generated a significant deal of interest in the development of new approaches to research in the humanities and social sciences, as well as a range of important critical interventions which warn of an unquestioned rush to ‘big data’. Drawing on the experiences made in developing innovative ‘big data’ approaches to social media research, this paper examines some of the repercussions for the scholarly research and publication practices of those researchers who do pursue the path of ‘big data’–centric investigation in their work. As researchers import the tools and methods of highly quantitative, statistical analysis from the ‘hard’ sciences into computational, digital humanities research, must they also subscribe to the language and assumptions underlying such ‘scientificity’? If so, how does this affect the choices made in gathering, processing, analysing, and disseminating the outcomes of digital humanities research? In particular, is there a need to rethink the forms and formats of publishing scholarly work in order to enable the rigorous scrutiny and replicability of research outcomes?
Resumo:
Bioacoustic data can provide an important base for environmental monitoring. To explore a large amount of field recordings collected, an automated similarity search algorithm is presented in this paper. A region of an audio defined by frequency and time bounds is provided by a user; the content of the region is used to construct a query. In the retrieving process, our algorithm will automatically scan through recordings to search for similar regions. In detail, we present a feature extraction approach based on the visual content of vocalisations – in this case ridges, and develop a generic regional representation of vocalisations for indexing. Our feature extraction method works best for bird vocalisations showing ridge characteristics. The regional representation method allows the content of an arbitrary region of a continuous recording to be described in a compressed format.
Resumo:
Spatial organisation of proteins according to their function plays an important role in the specificity of their molecular interactions. Emerging proteomics methods seek to assign proteins to sub-cellular locations by partial separation of organelles and computational analysis of protein abundance distributions among partially separated fractions. Such methods permit simultaneous analysis of unpurified organelles and promise proteome-wide localisation in scenarios wherein perturbation may prompt dynamic re-distribution. Resolving organelles that display similar behavior during a protocol designed to provide partial enrichment represents a possible shortcoming. We employ the Localisation of Organelle Proteins by Isotope Tagging (LOPIT) organelle proteomics platform to demonstrate that combining information from distinct separations of the same material can improve organelle resolution and assignment of proteins to sub-cellular locations. Two previously published experiments, whose distinct gradients are alone unable to fully resolve six known protein-organelle groupings, are subjected to a rigorous analysis to assess protein-organelle association via a contemporary pattern recognition algorithm. Upon straightforward combination of single-gradient data, we observe significant improvement in protein-organelle association via both a non-linear support vector machine algorithm and partial least-squares discriminant analysis. The outcome yields suggestions for further improvements to present organelle proteomics platforms, and a robust analytical methodology via which to associate proteins with sub-cellular organelles.
Resumo:
Transit passenger market segmentation enables transit operators to target different classes of transit users to provide customized information and services. The Smart Card (SC) data, from Automated Fare Collection system, facilitates the understanding of multiday travel regularity of transit passengers, and can be used to segment them into identifiable classes of similar behaviors and needs. However, the use of SC data for market segmentation has attracted very limited attention in the literature. This paper proposes a novel methodology for mining spatial and temporal travel regularity from each individual passenger’s historical SC transactions and segments them into four segments of transit users. After reconstructing the travel itineraries from historical SC transactions, the paper adopts the Density-Based Spatial Clustering of Application with Noise (DBSCAN) algorithm to mine travel regularity of each SC user. The travel regularity is then used to segment SC users by an a priori market segmentation approach. The methodology proposed in this paper assists transit operators to understand their passengers and provide them oriented information and services.
Resumo:
Technological advances have led to an influx of affordable hardware that supports sensing, computation and communication. This hardware is increasingly deployed in public and private spaces, tracking and aggregating a wealth of real-time environmental data. Although these technologies are the focus of several research areas, there is a lack of research dealing with the problem of making these capabilities accessible to everyday users. This thesis represents a first step towards developing systems that will allow users to leverage the available infrastructure and create custom tailored solutions. It explores how this notion can be utilized in the context of energy monitoring to improve conventional approaches. The project adopted a user-centered design process to inform the development of a flexible system for real-time data stream composition and visualization. This system features an extensible architecture and defines a unified API for heterogeneous data streams. Rather than displaying the data in a predetermined fashion, it makes this information available as building blocks that can be combined and shared. It is based on the insight that individual users have diverse information needs and presentation preferences. Therefore, it allows users to compose rich information displays, incorporating personally relevant data from an extensive information ecosystem. The prototype was evaluated in an exploratory study to observe its natural use in a real-world setting, gathering empirical usage statistics and conducting semi-structured interviews. The results show that a high degree of customization does not warrant sustained usage. Other factors were identified, yielding recommendations for increasing the impact on energy consumption.
Resumo:
In this paper, we present WebPut, a prototype system that adopts a novel web-based approach to the data imputation problem. Towards this, Webput utilizes the available information in an incomplete database in conjunction with the data consistency principle. Moreover, WebPut extends effective Information Extraction (IE) methods for the purpose of formulating web search queries that are capable of effectively retrieving missing values with high accuracy. WebPut employs a confidence-based scheme that efficiently leverages our suite of data imputation queries to automatically select the most effective imputation query for each missing value. A greedy iterative algorithm is proposed to schedule the imputation order of the different missing values in a database, and in turn the issuing of their corresponding imputation queries, for improving the accuracy and efficiency of WebPut. Moreover, several optimization techniques are also proposed to reduce the cost of estimating the confidence of imputation queries at both the tuple-level and the database-level. Experiments based on several real-world data collections demonstrate not only the effectiveness of WebPut compared to existing approaches, but also the efficiency of our proposed algorithms and optimization techniques.
Resumo:
To enhance the performance of the k-nearest neighbors approach in forecasting short-term traffic volume, this paper proposed and tested a two-step approach with the ability of forecasting multiple steps. In selecting k-nearest neighbors, a time constraint window is introduced, and then local minima of the distances between the state vectors are ranked to avoid overlappings among candidates. Moreover, to control extreme values’ undesirable impact, a novel algorithm with attractive analytical features is developed based on the principle component. The enhanced KNN method has been evaluated using the field data, and our comparison analysis shows that it outperformed the competing algorithms in most cases.
Resumo:
The K-means algorithm is one of the most popular techniques in clustering. Nevertheless, the performance of the K-means algorithm depends highly on initial cluster centers and converges to local minima. This paper proposes a hybrid evolutionary programming based clustering algorithm, called PSO-SA, by combining particle swarm optimization (PSO) and simulated annealing (SA). The basic idea is to search around the global solution by SA and to increase the information exchange among particles using a mutation operator to escape local optima. Three datasets, Iris, Wisconsin Breast Cancer, and Ripley’s Glass, have been considered to show the effectiveness of the proposed clustering algorithm in providing optimal clusters. The simulation results show that the PSO-SA clustering algorithm not only has a better response but also converges more quickly than the K-means, PSO, and SA algorithms.
Resumo:
Reliability of carrier phase ambiguity resolution (AR) of an integer least-squares (ILS) problem depends on ambiguity success rate (ASR), which in practice can be well approximated by the success probability of integer bootstrapping solutions. With the current GPS constellation, sufficiently high ASR of geometry-based model can only be achievable at certain percentage of time. As a result, high reliability of AR cannot be assured by the single constellation. In the event of dual constellations system (DCS), for example, GPS and Beidou, which provide more satellites in view, users can expect significant performance benefits such as AR reliability and high precision positioning solutions. Simply using all the satellites in view for AR and positioning is a straightforward solution, but does not necessarily lead to high reliability as it is hoped. The paper presents an alternative approach that selects a subset of the visible satellites to achieve a higher reliability performance of the AR solutions in a multi-GNSS environment, instead of using all the satellites. Traditionally, satellite selection algorithms are mostly based on the position dilution of precision (PDOP) in order to meet accuracy requirements. In this contribution, some reliability criteria are introduced for GNSS satellite selection, and a novel satellite selection algorithm for reliable ambiguity resolution (SARA) is developed. The SARA algorithm allows receivers to select a subset of satellites for achieving high ASR such as above 0.99. Numerical results from a simulated dual constellation cases show that with the SARA procedure, the percentages of ASR values in excess of 0.99 and the percentages of ratio-test values passing the threshold 3 are both higher than those directly using all satellites in view, particularly in the case of dual-constellation, the percentages of ASRs (>0.99) and ratio-test values (>3) could be as high as 98.0 and 98.5 % respectively, compared to 18.1 and 25.0 % without satellite selection process. It is also worth noting that the implementation of SARA is simple and the computation time is low, which can be applied in most real-time data processing applications.
Resumo:
Using Media-Access-Control (MAC) address for data collection and tracking is a capable and cost effective approach as the traditional ways such as surveys and video surveillance have numerous drawbacks and limitations. Positioning cell-phones by Global System for Mobile communication was considered an attack on people's privacy. MAC addresses just keep a unique log of a WiFi or Bluetooth enabled device for connecting to another device that has not potential privacy infringements. This paper presents the use of MAC address data collection approach for analysis of spatio-temporal dynamics of human in terms of shared space utilization. This paper firstly discuses the critical challenges and key benefits of MAC address data as a tracking technology for monitoring human movement. Here, proximity-based MAC address tracking is postulated as an effective methodology for analysing the complex spatio-temporal dynamics of human movements at shared zones such as lounge and office areas. A case study of university staff lounge area is described in detail and results indicates a significant added value of the methodology for human movement tracking. By analysis of MAC address data in the study area, clear statistics such as staff’s utilisation frequency, utilisation peak periods, and staff time spent is obtained. The analyses also reveal staff’s socialising profiles in terms of group and solo gathering. The paper is concluded with a discussion on why MAC address tracking offers significant advantages for tracking human behaviour in terms of shared space utilisation with respect to other and more prominent technologies, and outlines some of its remaining deficiencies.
Resumo:
The placement of the mappers and reducers on the machines directly affects the performance and cost of the MapReduce computation in cloud computing. From the computational point of view, the mappers/reducers placement problem is a generalization of the classical bin packing problem, which is NP-complete. Thus, in this paper we propose a new heuristic algorithm for the mappers/reducers placement problem in cloud computing and evaluate it by comparing with other several heuristics on solution quality and computation time by solving a set of test problems with various characteristics. The computational results show that our heuristic algorithm is much more efficient than the other heuristics. Also, we verify the effectiveness of our heuristic algorithm by comparing the mapper/reducer placement for a benchmark problem generated by our heuristic algorithm with a conventional mapper/reducer placement. The comparison results show that the computation using our mapper/reducer placement is much cheaper while still satisfying the computation deadline.
Resumo:
MapReduce is a computation model for processing large data sets in parallel on large clusters of machines, in a reliable, fault-tolerant manner. A MapReduce computation is broken down into a number of map tasks and reduce tasks, which are performed by so called mappers and reducers, respectively. The placement of the mappers and reducers on the machines directly affects the performance and cost of the MapReduce computation. From the computational point of view, the mappers/reducers placement problem is a generation of the classical bin packing problem, which is NPcomplete. Thus, in this paper we propose a new grouping genetic algorithm for the mappers/reducers placement problem in cloud computing. Compared with the original one, our grouping genetic algorithm uses an innovative coding scheme and also eliminates the inversion operator which is an essential operator in the original grouping genetic algorithm. The new grouping genetic algorithm is evaluated by experiments and the experimental results show that it is much more efficient than four popular algorithms for the problem, including the original grouping genetic algorithm.
Resumo:
A Software-as-a-Service or SaaS can be delivered in a composite form, consisting of a set of application and data components that work together to deliver higher-level functional software. Components in a composite SaaS may need to be scaled – replicated or deleted, to accommodate the user’s load. It may not be necessary to replicate all components of the SaaS, as some components can be shared by other instances. On the other hand, when the load is low, some of the instances may need to be deleted to avoid resource underutilisation. Thus, it is important to determine which components are to be scaled such that the performance of the SaaS is still maintained. Extensive research on the SaaS resource management in Cloud has not yet addressed the challenges of scaling process for composite SaaS. Therefore, a hybrid genetic algorithm is proposed in which it utilises the problem’s knowledge and explores the best combination of scaling plan for the components. Experimental results demonstrate that the proposed algorithm outperforms existing heuristic-based solutions.
Accelerometer data reduction : a comparison of four reduction algorithms on select outcome variables
Resumo:
Purpose Accelerometers are recognized as a valid and objective tool to assess free-living physical activity. Despite the widespread use of accelerometers, there is no standardized way to process and summarize data from them, which limits our ability to compare results across studies. This paper a) reviews decision rules researchers have used in the past, b) compares the impact of using different decision rules on a common data set, and c) identifies issues to consider for accelerometer data reduction. Methods The methods sections of studies published in 2003 and 2004 were reviewed to determine what decision rules previous researchers have used to identify wearing period, minimal wear requirement for a valid day, spurious data, number of days used to calculate the outcome variables, and extract bouts of moderate to vigorous physical activity (MVPA). For this study, four data reduction algorithms that employ different decision rules were used to analyze the same data set. Results The review showed that among studies that reported their decision rules, much variability was observed. Overall, the analyses suggested that using different algorithms impacted several important outcome variables. The most stringent algorithm yielded significantly lower wearing time, the lowest activity counts per minute and counts per day, and fewer minutes of MVPA per day. An exploratory sensitivity analysis revealed that the most stringent inclusion criterion had an impact on sample size and wearing time, which in turn affected many outcome variables. Conclusions These findings suggest that the decision rules employed to process accelerometer data have a significant impact on important outcome variables. Until guidelines are developed, it will remain difficult to compare findings across studies