814 resultados para data gathering algorithm


Relevância:

30.00% 30.00%

Publicador:

Resumo:

Software as a Service (SaaS) in Cloud is getting more and more significant among software users and providers recently. A SaaS that is delivered as composite application has many benefits including reduced delivery costs, flexible offers of the SaaS functions and decreased subscription cost for users. However, this approach has introduced a new problem in managing the resources allocated to the composite SaaS. The resource allocation that has been done at the initial stage may be overloaded or wasted due to the dynamic environment of a Cloud. A typical data center resource management usually triggers a placement reconfiguration for the SaaS in order to maintain its performance as well as to minimize the resource used. Existing approaches for this problem often ignore the underlying dependencies between SaaS components. In addition, the reconfiguration also has to comply with SaaS constraints in terms of its resource requirements, placement requirement as well as its SLA. To tackle the problem, this paper proposes a penalty-based Grouping Genetic Algorithm for multiple composite SaaS components clustering in Cloud. The main objective is to minimize the resource used by the SaaS by clustering its component without violating any constraint. Experimental results demonstrate the feasibility and the scalability of the proposed algorithm.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Improving energy efficiency has become increasingly important in data centers in recent years to reduce the rapidly growing tremendous amounts of electricity consumption. The power dissipation of the physical servers is the root cause of power usage of other systems, such as cooling systems. Many efforts have been made to make data centers more energy efficient. One of them is to minimize the total power consumption of these servers in a data center through virtual machine consolidation, which is implemented by virtual machine placement. The placement problem is often modeled as a bin packing problem. Due to the NP-hard nature of the problem, heuristic solutions such as First Fit and Best Fit algorithms have been often used and have generally good results. However, their performance leaves room for further improvement. In this paper we propose a Simulated Annealing based algorithm, which aims at further improvement from any feasible placement. This is the first published attempt of using SA to solve the VM placement problem to optimize the power consumption. Experimental results show that this SA algorithm can generate better results, saving up to 25 percentage more energy than First Fit Decreasing in an acceptable time frame.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Here we present a sequential Monte Carlo approach to Bayesian sequential design for the incorporation of model uncertainty. The methodology is demonstrated through the development and implementation of two model discrimination utilities; mutual information and total separation, but it can also be applied more generally if one has different experimental aims. A sequential Monte Carlo algorithm is run for each rival model (in parallel), and provides a convenient estimate of the marginal likelihood (of each model) given the data, which can be used for model comparison and in the evaluation of utility functions. A major benefit of this approach is that it requires very little problem specific tuning and is also computationally efficient when compared to full Markov chain Monte Carlo approaches. This research is motivated by applications in drug development and chemical engineering.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

A simple and effective down-sample algorithm, Peak-Hold-Down-Sample (PHDS) algorithm is developed in this paper to enable a rapid and efficient data transfer in remote condition monitoring applications. The algorithm is particularly useful for high frequency Condition Monitoring (CM) techniques, and for low speed machine applications since the combination of the high sampling frequency and low rotating speed will generally lead to large unwieldy data size. The effectiveness of the algorithm was evaluated and tested on four sets of data in the study. One set of the data was extracted from the condition monitoring signal of a practical industry application. Another set of data was acquired from a low speed machine test rig in the laboratory. The other two sets of data were computer simulated bearing defect signals having either a single or multiple bearing defects. The results disclose that the PHDS algorithm can substantially reduce the size of data while preserving the critical bearing defect information for all the data sets used in this work even when a large down-sample ratio was used (i.e., 500 times down-sampled). In contrast, the down-sample process using existing normal down-sample technique in signal processing eliminates the useful and critical information such as bearing defect frequencies in a signal when the same down-sample ratio was employed. Noise and artificial frequency components were also induced by the normal down-sample technique, thus limits its usefulness for machine condition monitoring applications.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background Cancer outlier profile analysis (COPA) has proven to be an effective approach to analyzing cancer expression data, leading to the discovery of the TMPRSS2 and ETS family gene fusion events in prostate cancer. However, the original COPA algorithm did not identify down-regulated outliers, and the currently available R package implementing the method is similarly restricted to the analysis of over-expressed outliers. Here we present a modified outlier detection method, mCOPA, which contains refinements to the outlier-detection algorithm, identifies both over- and under-expressed outliers, is freely available, and can be applied to any expression dataset. Results We compare our method to other feature-selection approaches, and demonstrate that mCOPA frequently selects more-informative features than do differential expression or variance-based feature selection approaches, and is able to recover observed clinical subtypes more consistently. We demonstrate the application of mCOPA to prostate cancer expression data, and explore the use of outliers in clustering, pathway analysis, and the identification of tumour suppressors. We analyse the under-expressed outliers to identify known and novel prostate cancer tumour suppressor genes, validating these against data in Oncomine and the Cancer Gene Index. We also demonstrate how a combination of outlier analysis and pathway analysis can identify molecular mechanisms disrupted in individual tumours. Conclusions We demonstrate that mCOPA offers advantages, compared to differential expression or variance, in selecting outlier features, and that the features so selected are better able to assign samples to clinically annotated subtypes. Further, we show that the biology explored by outlier analysis differs from that uncovered in differential expression or variance analysis. mCOPA is an important new tool for the exploration of cancer datasets and the discovery of new cancer subtypes, and can be combined with pathway and functional analysis approaches to discover mechanisms underpinning heterogeneity in cancers

Relevância:

30.00% 30.00%

Publicador:

Resumo:

In this paper, we present WebPut, a prototype system that adopts a novel web-based approach to the data imputation problem. Towards this, Webput utilizes the available information in an incomplete database in conjunction with the data consistency principle. Moreover, WebPut extends effective Information Extraction (IE) methods for the purpose of formulating web search queries that are capable of effectively retrieving missing values with high accuracy. WebPut employs a confidence-based scheme that efficiently leverages our suite of data imputation queries to automatically select the most effective imputation query for each missing value. A greedy iterative algorithm is also proposed to schedule the imputation order of the different missing values in a database, and in turn the issuing of their corresponding imputation queries, for improving the accuracy and efficiency of WebPut. Experiments based on several real-world data collections demonstrate that WebPut outperforms existing approaches.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Background Accumulated biological research outcomes show that biological functions do not depend on individual genes, but on complex gene networks. Microarray data are widely used to cluster genes according to their expression levels across experimental conditions. However, functionally related genes generally do not show coherent expression across all conditions since any given cellular process is active only under a subset of conditions. Biclustering finds gene clusters that have similar expression levels across a subset of conditions. This paper proposes a seed-based algorithm that identifies coherent genes in an exhaustive, but efficient manner. Methods In order to find the biclusters in a gene expression dataset, we exhaustively select combinations of genes and conditions as seeds to create candidate bicluster tables. The tables have two columns: (a) a gene set, and (b) the conditions on which the gene set have dissimilar expression levels to the seed. First, the genes with less than the maximum number of dissimilar conditions are identified and a table of these genes is created. Second, the rows that have the same dissimilar conditions are grouped together. Third, the table is sorted in ascending order based on the number of dissimilar conditions. Finally, beginning with the first row of the table, a test is run repeatedly to determine whether the cardinality of the gene set in the row is greater than the minimum threshold number of genes in a bicluster. If so, a bicluster is outputted and the corresponding row is removed from the table. Repeating this process, all biclusters in the table are systematically identified until the table becomes empty. Conclusions This paper presents a novel biclustering algorithm for the identification of additive biclusters. Since it involves exhaustively testing combinations of genes and conditions, the additive biclusters can be found more readily.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

miRDeep and its varieties are widely used to quantify known and novel micro RNA (miRNA) from small RNA sequencing (RNAseq). This article describes miRDeep*, our integrated miRNA identification tool, which is modeled off miRDeep, but the precision of detecting novel miRNAs is improved by introducing new strategies to identify precursor miRNAs. miRDeep* has a user-friendly graphic interface and accepts raw data in FastQ and Sequence Alignment Map (SAM) or the binary equivalent (BAM) format. Known and novel miRNA expression levels, as measured by the number of reads, are displayed in an interface, which shows each RNAseq read relative to the pre-miRNA hairpin. The secondary pre-miRNA structure and read locations for each predicted miRNA are shown and kept in a separate figure file. Moreover, the target genes of known and novel miRNAs are predicted using the TargetScan algorithm, and the targets are ranked according to the confidence score. miRDeep* is an integrated standalone application where sequence alignment, pre-miRNA secondary structure calculation and graphical display are purely Java coded. This application tool can be executed using a normal personal computer with 1.5 GB of memory. Further, we show that miRDeep* outperformed existing miRNA prediction tools using our LNCaP and other small RNAseq datasets. miRDeep* is freely available online at http://www.australianprostatecentre.org/research/software/mirdeep-star

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The aim of this paper is to implement a Game-Theory based offline mission path planner for aerial inspection tasks of large linear infrastructures. Like most real-world optimisation problems, mission path planning involves a number of objectives which ideally should be minimised simultaneously. The goal of this work is then to develop a Multi-Objective (MO) optimisation tool able to provide a set of optimal solutions for the inspection task, given the environment data, the mission requirements and the definition of the objectives to minimise. Results indicate the robustness and capability of the method to find the trade-off between the Pareto-optimal solutions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

With the explosive growth of resources available through the Internet, information mismatching and overload have become a severe concern to users. Web users are commonly overwhelmed by huge volume of information and are faced with the challenge of finding the most relevant and reliable information in a timely manner. Personalised information gathering and recommender systems represent state-of-the-art tools for efficient selection of the most relevant and reliable information resources, and the interest in such systems has increased dramatically over the last few years. However, web personalization has not yet been well-exploited; difficulties arise while selecting resources through recommender systems from a technological and social perspective. Aiming to promote high quality research in order to overcome these challenges, this paper provides a comprehensive survey on the recent work and achievements in the areas of personalised web information gathering and recommender systems. The report covers concept-based techniques exploited in personalised information gathering and recommender systems.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Research has long documented the value that design brings to the innovation of products and services. The research landscape has transformed in the last decade and now reflects the value of design as a different way thinking that can be applied to the innovation of business models and catalyst for strategic growth. This paper presents a case study of gathering deep customer insights through a design led innovation approach and reveals industry perspectives and attitudes towards the value of deep customer insights within the context of a leading Australian airport corporation. The findings highlight that the process of gathering deep customer insights encourages a design led approach to testing assumptions and developing stronger customer engagement. The richness of the deep customer insights also provided a bridge to future thought by provoking possible product, service and business innovations which aligned to the airport corporation’s vision. The implications of the study reveal how quantitative market data, which reveals broad sociocultural trends into ‘how’ and ‘what’ customers interact with within an airport, can be strongly validated and built upon through qualitative deep customer insights that explore ‘why’ those choices to interact are made. Future research is then presented which aims to widely disseminate a design led approach to innovation within internal stakeholders of the airport corporation through the development of a digital strategy.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Electricity cost has become a major expense for running data centers and server consolidation using virtualization technology has been used as an important technology to improve the energy efficiency of data centers. In this research, a genetic algorithm and a simulation-annealing algorithm are proposed for the static virtual machine placement problem that considers the energy consumption in both the servers and the communication network, and a trading algorithm is proposed for dynamic virtual machine placement. Experimental results have shown that the proposed methods are more energy efficient than existing solutions.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

The promise of ‘big data’ has generated a significant deal of interest in the development of new approaches to research in the humanities and social sciences, as well as a range of important critical interventions which warn of an unquestioned rush to ‘big data’. Drawing on the experiences made in developing innovative ‘big data’ approaches to social media research, this paper examines some of the repercussions for the scholarly research and publication practices of those researchers who do pursue the path of ‘big data’–centric investigation in their work. As researchers import the tools and methods of highly quantitative, statistical analysis from the ‘hard’ sciences into computational, digital humanities research, must they also subscribe to the language and assumptions underlying such ‘scientificity’? If so, how does this affect the choices made in gathering, processing, analysing, and disseminating the outcomes of digital humanities research? In particular, is there a need to rethink the forms and formats of publishing scholarly work in order to enable the rigorous scrutiny and replicability of research outcomes?

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Bioacoustic data can provide an important base for environmental monitoring. To explore a large amount of field recordings collected, an automated similarity search algorithm is presented in this paper. A region of an audio defined by frequency and time bounds is provided by a user; the content of the region is used to construct a query. In the retrieving process, our algorithm will automatically scan through recordings to search for similar regions. In detail, we present a feature extraction approach based on the visual content of vocalisations – in this case ridges, and develop a generic regional representation of vocalisations for indexing. Our feature extraction method works best for bird vocalisations showing ridge characteristics. The regional representation method allows the content of an arbitrary region of a continuous recording to be described in a compressed format.

Relevância:

30.00% 30.00%

Publicador:

Resumo:

Spatial organisation of proteins according to their function plays an important role in the specificity of their molecular interactions. Emerging proteomics methods seek to assign proteins to sub-cellular locations by partial separation of organelles and computational analysis of protein abundance distributions among partially separated fractions. Such methods permit simultaneous analysis of unpurified organelles and promise proteome-wide localisation in scenarios wherein perturbation may prompt dynamic re-distribution. Resolving organelles that display similar behavior during a protocol designed to provide partial enrichment represents a possible shortcoming. We employ the Localisation of Organelle Proteins by Isotope Tagging (LOPIT) organelle proteomics platform to demonstrate that combining information from distinct separations of the same material can improve organelle resolution and assignment of proteins to sub-cellular locations. Two previously published experiments, whose distinct gradients are alone unable to fully resolve six known protein-organelle groupings, are subjected to a rigorous analysis to assess protein-organelle association via a contemporary pattern recognition algorithm. Upon straightforward combination of single-gradient data, we observe significant improvement in protein-organelle association via both a non-linear support vector machine algorithm and partial least-squares discriminant analysis. The outcome yields suggestions for further improvements to present organelle proteomics platforms, and a robust analytical methodology via which to associate proteins with sub-cellular organelles.